Optimizing HoloViews and Param for faster streaming/DynamicMap
Context
We are developing a dashboard using Panel+HoloViews, displaying data streams from Kafka (a variety of 1D and 2D plots). Despite updating only once per second we are experiencing a variety of performance issues, so I started working on some profiling and optimizations. Encouraged by early successes (see perf: Add numeric fast path to max_range and dimension_range by SimonHeybrock · Pull Request #6806 · holoviz/holoviews · GitHub, which was just accepted and merged), I kept going and had some more good reauls (up to 3-4x service-side speedup, 2x end-to-end across a range of element types and overlay sizes).
Below I will be describing a lot of small and mostly unrelated (but similar) changes that compound into substantial improvements overall. This is currently not in a cleaned-up state and far from ready for a PR, because I would first like to gauge interest in these changes - some of them are double-edges swords as performance gains are moderate but complexity improves. Roughly speaking, I’d like to:
- Hear your opinions.
- Get feedback on how much this actually improves performance in real dashboards, as benchmarks are always limited. Do you have any non-trivial apps that you could try this out on? Branches in my forks a listed below.
- Find out if and how things should be incorporated. I don’t think one big PR is a good idea. We could pick a few of the most promising changes and handle them individually.
I’d like to gauge interest in these changes before cleaning them up into proper PRs. Each optimization is an independent commit and could be submitted separately.
The total size of the changes is quite modest actually:
$ cd holoviz/param && git diff --stat main
param/parameterized.py | 49 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 14 deletions(-)
$ cd holoviz/holoviews && git diff --stat main
holoviews/core/data/__init__.py | 5 +++--
holoviews/core/util/types.py | 33 +++++++++++++++++++++++++++++----
holoviews/plotting/bokeh/element.py | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
holoviews/plotting/bokeh/graphs.py | 14 ++++++++------
holoviews/plotting/bokeh/plot.py | 10 ++++++----
holoviews/plotting/plot.py | 4 ++--
6 files changed, 138 insertions(+), 26 deletions(-)
Finally, I want to mention that most of this was driven by Claude Code - I am not familiar enough with internals of these libraries to do much myself. So far, I was mainly guiding directions of benchmarks as well as asking critical questions. I will be following best practices if we get to the point of creating PRs - right now I did not do full reviews of the changes, nor did we update tests.
Details that follow below are mostly AI written.
Benchmark setup
To understand what the server-side speedup means in practice, I also built an end-to-end benchmark that includes browser rendering (headless Chromium via Playwright). The key finding: browser rendering — not HoloViews — is the bottleneck for single-session frame rate, so a single streaming dashboard doesn’t see a proportional fps improvement. However, the server-side gains directly translate to lower CPU usage and capacity for more concurrent sessions.
Two complementary benchmarks:
Headless profiling (isolates HoloViews cost):
- 1,000 points per element, 60 updates per run, 5 repeats per scenario
- Measured with
cProfile(inflates absolute times ~2x but preserves relative proportions) and rawtime.perf_counter - All scenarios use
Pipe+DynamicMap— the typical streaming pattern
End-to-end browser benchmark (full pipeline):
- Panel server + headless Chromium (Playwright), measuring HoloViews processing, Bokeh serialization, and browser rendering independently
- Updates wrapped in
pn.io.hold(doc)+doc.models.freeze()(best-practice batching) - Sequential mode (push-wait-push for per-component breakdown) and sustained mode (push at server speed, measures coalescing and effective fps)
Headless results (server-side only, main vs optimized)
cProfile median ms/update:
| Scenario | main | optimized | speedup |
|---|---|---|---|
| Curve x1 | 24.7 | 6.6 | 3.7x |
| Curve x6 | 69.6 | 20.5 | 3.4x |
| Scatter x6 | 90.4 | 34.7 | 2.6x |
| Points x6 | 88.5 | 33.5 | 2.6x |
| Image | 25.9 | 12.7 | 2.0x |
| Bars | 24.7 | 11.1 | 2.2x |
| HeatMap | 37.2 | 21.0 | 1.8x |
Raw timing for Curve x6 (no cProfile overhead):
| Condition | main | optimized | speedup |
|---|---|---|---|
| With Bokeh validation | 32.8 ms | 8.6 ms | 3.8x |
| Without Bokeh validation | 25.6 ms | 7.7 ms | 3.3x |
End-to-end results (with browser rendering)
Configuration: pn.io.hold(doc) + doc.models.freeze(), headless Chromium, 30 updates.
Server-side improvement (E2E benchmark)
| Metric | 1 plot x 1 curve | 1 plot x 6 curves | 4 plots x 6 curves |
|---|---|---|---|
| HV processing (main) | 14.2 ms | 37.7 ms | 142.2 ms |
| HV processing (optimized) | 4.5 ms | 9.6 ms | 31.3 ms |
| HV improvement | -68% | -75% | -78% |
| Server total (main) | 18.3 ms | 47.2 ms | 173.6 ms |
| Server total (optimized) | 9.9 ms | 21.5 ms | 62.8 ms |
| Server improvement | -46% | -54% | -64% |
Server total = HoloViews processing + Bokeh serialization/Document sync. The serialization cost (~31 ms at 4x6) is unchanged — the improvement is entirely in HoloViews.
Single-session frame rate
| Metric | 1x1 | 1x6 | 4x6 |
|---|---|---|---|
| Browser render time | 32 ms | 134 ms | 370 ms |
| Visible fps (main) | 31 | 10.7 | 2.9 |
| Visible fps (optimized) | 23 | 8.5 | 3.0 |
Single-session fps is roughly unchanged because browser rendering dominates. At 4x6, the browser needs ~370 ms per frame (headless Chromium, software rendering — a real browser with GPU is faster). Making the server faster just means it “waits faster” between frames. In sustained mode, the server outpaces the browser and excess pushes are coalesced (only the latest data is rendered per frame).
Real browser validation
To confirm the Playwright results, I also ran the same scenario in a real Chrome browser using an interactive Panel dashboard with matching settings:
| Metric | main | optimized | improvement |
|---|---|---|---|
| Server processing (median) | 194 ms | 89 ms | 2.2x |
| Server capacity | 5/s | 11/s | 2.2x |
The 2.2x improvement (vs 2.8x in the Playwright benchmark) is expected — the dashboard timing includes data generation overhead which is unchanged between branches. The pure HoloViews processing improvement is consistent.
Multi-session capacity
The server-side speedup directly translates to capacity for concurrent sessions:
| Metric | 1x1 | 1x6 | 4x6 |
|---|---|---|---|
| Server capacity (main) | 55/s | 21/s | 5.8/s |
| Server capacity (optimized) | 101/s | 47/s | 16/s |
| Capacity improvement | 1.8x | 2.2x | 2.8x |
At 4x6, this means the server can feed ~3x more simultaneous streaming dashboards before becoming the bottleneck. Alternatively, for a single session, the server CPU is free ~64% more of the time for other work (callbacks, widget interactions, other sessions).
The changes
Twelve changes in HoloViews, three in param. All are independent and could be submitted as separate PRs.
HoloViews changes
1. Cache gen_types type tuples (+29/-4 in core/util/types.py)
The _GeneratorIsMeta metaclass re-evaluates its generator function on every isinstance() call. For hot paths like isfinite, this produced ~550k generator evaluations per 60 updates. Cache the resulting tuple after first evaluation.
- Complexity: Low. Adds
_cached_typesattribute + cache invalidation function. - Confidence: High — the generated types only change if new libraries are imported post-startup, which is rare. Invalidation function provided for that case.
2. Disable pipeline bookkeeping during plot refresh (+5/-4 across 2 files)
PipelineMeta records every Dataset method call for reproducibility. During plot refresh these are read-only queries — no new Datasets are created. Wrap the refresh path in disable_pipeline().
- Complexity: Trivial. Two-line change: move the
disablecheck earlier inpipelined_fnand wraprefresh()→_trigger_refresh(). - Confidence: High — pipeline recording is only needed for user-facing
.pipelineattribute, not during rendering.
3. Skip Bokeh property validation in _update_datasource (+14/-10 across 2 files)
Bokeh validates every element of every column when ColumnDataSource.data is assigned. For 6 curves x 1,000 points this is ~12k validation calls per update — all no-ops since HoloViews already ensures correct types. Wrap the assignment in Bokeh’s validate(False).
- Complexity: Low. Uses Bokeh’s own public
validatecontext manager. - Confidence: High — Bokeh’s
validate(False)is the documented way to skip per-element checks. The important structural checks (column length equality, etc.) are Python assertions that run regardless.
4. Remove dead hasattr check triggering Bokeh’s difflib (-3 lines in plotting/bokeh/element.py)
hasattr(glyph, 'visible') always returns False (no Bokeh glyph has a visible property — only renderers do). But Bokeh’s __getattr__ calls difflib.get_close_matches() before raising AttributeError, adding ~1% overhead per update.
- Complexity: Trivial. Pure dead code removal.
- Confidence: Very high — visibility is already correctly controlled via the GlyphRenderer.
5. Short-circuit _update_labels when axis props unchanged (+7 lines in plotting/bokeh/element.py)
_update_labels calls recursive_model_update on every update, which traverses all Bokeh model properties via properties_with_values. Cache the computed axis props dict and skip when unchanged — the common case during streaming when labels don’t change.
- Complexity: Low. Cache comparison before the expensive call.
- Confidence: High — labels only change when axis titles/formatters change, not on data updates.
6. Short-circuit _update_ranges when computed ranges unchanged (+24 lines in plotting/bokeh/element.py)
_update_ranges recomputes full min/max over all data on every frame, then applies padding, option lookups, and Bokeh model updates. Cache the computed range values and skip the entire method when unchanged. This is common in streaming with fixed-range signals or when data happens to stay within previously computed bounds.
- Complexity: Medium. Needs to bail out for categorical axes (factor arrays) and
subcoordinate_yplots. - Confidence: Medium-high — the cache is conservative (any change triggers a full recompute). The bail-out cases need testing.
7. Skip param.update when plot options unchanged (+8/-2 in plotting/bokeh/element.py)
update_frame calls self.param.update(**plot_opts) on every frame. param.update iterates all ~80 parameters to build a restore dict, sets attributes, and checks watchers. Cache the plot options dict and skip when unchanged — the normal case during streaming when plot options are static.
- Complexity: Low. Dict comparison before the expensive call, applied at both subplot and overlay levels.
- Confidence: High — plot options only change when the user explicitly changes them, not on data updates.
8. Skip glyph visual property updates when style unchanged (+16/-0 in plotting/bokeh/element.py)
_update_glyphs iterates 5 glyph types (normal, selection, nonselection, hover, muted), filters properties, introspects Bokeh models, and calls glyph.update() — all on every frame. When visual properties haven’t changed (the common case during streaming), skip the glyph property loop entirely and only update the data source.
- Complexity: Low. Extracts non-source properties for comparison, falls through on first frame and when styles change.
- Confidence: High — data source updates are always applied; only the visual property application is skipped.
9. Skip _process_legend when overlay structure unchanged (+13 lines in plotting/bokeh/element.py)
_process_legend rebuilds legend items on every overlay update. Cache a tuple of (subplot keys, renderer visibility, legend config) and skip when unchanged — the normal case during streaming.
- Complexity: Low. Tuple comparison against previous state.
- Confidence: High — legend only needs updating when subplots are added/removed or legend settings change.
10. Cache plot properties, title, and grid in _update_plot (+22/-3 in plotting/bokeh/element.py)
_update_plot calls plot.update(), title.update(), and grid update() on every frame, each involving Bokeh model introspection. Cache the computed property dicts and skip updates when unchanged.
- Complexity: Low. Three cache comparisons for properties that rarely change during streaming.
- Confidence: High — these properties change only when axis configuration or plot styling changes.
11. Numeric fast path for max_range / dimension_range (+52/-5 in core/util/__init__.py, +134 test lines) — PR #6806
This one already has a PR open (merged now). max_range is called ~90 times per frame for a 6-curve overlay. Each call creates a numpy array from 1-3 tuples just to call nanmin/nanmax. Add a pure-Python fast path for the common numeric case (int/float/numpy scalars), avoiding array allocation entirely. Also add an early return in dimension_range when hard_range bounds are already finite.
- Complexity: Medium. The fast path must preserve numpy scalar types and handle None/NaN/inf correctly. Comes with 38 dedicated tests.
- Confidence: High — well tested, clear fallback to existing code for non-numeric types.
Param changes (35 lines changed in parameterized.py)
12a. Avoid full values() iteration in _update() (+4/-2)
_update() called self.values() to build a restore dict, iterating all ~80 parameters on HoloViews plot objects. The restore dict only needs entries for the kwargs being updated (typically 5-15). Replace with targeted getattr for just those keys.
- Complexity: Trivial.
- Confidence: High — the semantics are identical, just fewer iterations.
12b. Inline get_value_generator logic in values() (+19/-3)
values() called get_value_generator(name) per parameter, each call re-fetching the objects dict and creating a new Parameters accessor. Inline the logic so the objects dict is fetched once.
- Complexity: Low-medium. The inlined code handles Composite, Dynamic, and plain parameters.
- Confidence: Medium-high — follows the same logic, just avoids redundant lookups. Needs testing with Dynamic parameters and Composite parameters.
12c. Bypass __getattribute__ in Parameter.__get__ (+12/-9)
Parameter.__getattribute__ checks whether slot values are Undefined — only relevant for unbound parameters. __get__ is only invoked on bound parameters, so the check is always a no-op. Use object.__getattribute__ directly.
- Complexity: Low.
- Confidence: High —
__get__is a descriptor protocol method only called on class-bound parameters.
Understanding the E2E picture
The E2E benchmark reveals an important architectural insight: the HoloViews update pipeline has three independent stages, each with its own throughput limit:
- HoloViews processing — compute new plot state from data (optimized here)
- Bokeh Document sync — serialize changes, send over WebSocket
- Browser rendering — decode, update DOM/canvas, paint
These stages are pipelined — the server is free while the browser renders. When the server pushes faster than the browser can render, updates are coalesced: only the latest data is visible per frame. This means:
- Single-session fps is clamped by the slowest stage (browser rendering for complex layouts). Server-side improvements don’t increase fps when the browser is the bottleneck.
- Server capacity improvements benefit multi-session deployments. At 4 plots x 6 curves, server capacity goes from 5.8 to 16 pushes/sec — capacity for ~3x more simultaneous streaming clients.
- For single-session use, the server CPU is freed up. The optimized server spends ~64% less time processing updates, leaving more headroom for callbacks, widget interactions, and other user-facing work.
Using pn.io.hold(doc) + doc.models.freeze() is essential for batching — without it, Bokeh recomputes its model graph on every property change, adding significant overhead (up to 37% more server time at 4x6).
Try it yourself
The changes are on two branches in my forks: