Optimizing HoloViews and Param for faster streaming/DynamicMap

Optimizing HoloViews and Param for faster streaming/DynamicMap

Context

We are developing a dashboard using Panel+HoloViews, displaying data streams from Kafka (a variety of 1D and 2D plots). Despite updating only once per second we are experiencing a variety of performance issues, so I started working on some profiling and optimizations. Encouraged by early successes (see perf: Add numeric fast path to max_range and dimension_range by SimonHeybrock · Pull Request #6806 · holoviz/holoviews · GitHub, which was just accepted and merged), I kept going and had some more good reauls (up to 3-4x service-side speedup, 2x end-to-end across a range of element types and overlay sizes).

Below I will be describing a lot of small and mostly unrelated (but similar) changes that compound into substantial improvements overall. This is currently not in a cleaned-up state and far from ready for a PR, because I would first like to gauge interest in these changes - some of them are double-edges swords as performance gains are moderate but complexity improves. Roughly speaking, I’d like to:

  1. Hear your opinions.
  2. Get feedback on how much this actually improves performance in real dashboards, as benchmarks are always limited. Do you have any non-trivial apps that you could try this out on? Branches in my forks a listed below.
  3. Find out if and how things should be incorporated. I don’t think one big PR is a good idea. We could pick a few of the most promising changes and handle them individually.

I’d like to gauge interest in these changes before cleaning them up into proper PRs. Each optimization is an independent commit and could be submitted separately.

The total size of the changes is quite modest actually:

$ cd holoviz/param && git diff --stat main
 param/parameterized.py | 49 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 35 insertions(+), 14 deletions(-)
 $ cd holoviz/holoviews && git diff --stat main
 holoviews/core/data/__init__.py     |  5 +++--
 holoviews/core/util/types.py        | 33 +++++++++++++++++++++++++++++----
 holoviews/plotting/bokeh/element.py | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 holoviews/plotting/bokeh/graphs.py  | 14 ++++++++------
 holoviews/plotting/bokeh/plot.py    | 10 ++++++----
 holoviews/plotting/plot.py          |  4 ++--
 6 files changed, 138 insertions(+), 26 deletions(-)

Finally, I want to mention that most of this was driven by Claude Code - I am not familiar enough with internals of these libraries to do much myself. So far, I was mainly guiding directions of benchmarks as well as asking critical questions. I will be following best practices if we get to the point of creating PRs - right now I did not do full reviews of the changes, nor did we update tests.

Details that follow below are mostly AI written.

Benchmark setup

To understand what the server-side speedup means in practice, I also built an end-to-end benchmark that includes browser rendering (headless Chromium via Playwright). The key finding: browser rendering — not HoloViews — is the bottleneck for single-session frame rate, so a single streaming dashboard doesn’t see a proportional fps improvement. However, the server-side gains directly translate to lower CPU usage and capacity for more concurrent sessions.

Two complementary benchmarks:

Headless profiling (isolates HoloViews cost):

  • 1,000 points per element, 60 updates per run, 5 repeats per scenario
  • Measured with cProfile (inflates absolute times ~2x but preserves relative proportions) and raw time.perf_counter
  • All scenarios use Pipe + DynamicMap — the typical streaming pattern

End-to-end browser benchmark (full pipeline):

  • Panel server + headless Chromium (Playwright), measuring HoloViews processing, Bokeh serialization, and browser rendering independently
  • Updates wrapped in pn.io.hold(doc) + doc.models.freeze() (best-practice batching)
  • Sequential mode (push-wait-push for per-component breakdown) and sustained mode (push at server speed, measures coalescing and effective fps)

Headless results (server-side only, main vs optimized)

cProfile median ms/update:

Scenario main optimized speedup
Curve x1 24.7 6.6 3.7x
Curve x6 69.6 20.5 3.4x
Scatter x6 90.4 34.7 2.6x
Points x6 88.5 33.5 2.6x
Image 25.9 12.7 2.0x
Bars 24.7 11.1 2.2x
HeatMap 37.2 21.0 1.8x

Raw timing for Curve x6 (no cProfile overhead):

Condition main optimized speedup
With Bokeh validation 32.8 ms 8.6 ms 3.8x
Without Bokeh validation 25.6 ms 7.7 ms 3.3x

End-to-end results (with browser rendering)

Configuration: pn.io.hold(doc) + doc.models.freeze(), headless Chromium, 30 updates.

Server-side improvement (E2E benchmark)

Metric 1 plot x 1 curve 1 plot x 6 curves 4 plots x 6 curves
HV processing (main) 14.2 ms 37.7 ms 142.2 ms
HV processing (optimized) 4.5 ms 9.6 ms 31.3 ms
HV improvement -68% -75% -78%
Server total (main) 18.3 ms 47.2 ms 173.6 ms
Server total (optimized) 9.9 ms 21.5 ms 62.8 ms
Server improvement -46% -54% -64%

Server total = HoloViews processing + Bokeh serialization/Document sync. The serialization cost (~31 ms at 4x6) is unchanged — the improvement is entirely in HoloViews.

Single-session frame rate

Metric 1x1 1x6 4x6
Browser render time 32 ms 134 ms 370 ms
Visible fps (main) 31 10.7 2.9
Visible fps (optimized) 23 8.5 3.0

Single-session fps is roughly unchanged because browser rendering dominates. At 4x6, the browser needs ~370 ms per frame (headless Chromium, software rendering — a real browser with GPU is faster). Making the server faster just means it “waits faster” between frames. In sustained mode, the server outpaces the browser and excess pushes are coalesced (only the latest data is rendered per frame).

Real browser validation

To confirm the Playwright results, I also ran the same scenario in a real Chrome browser using an interactive Panel dashboard with matching settings:

Metric main optimized improvement
Server processing (median) 194 ms 89 ms 2.2x
Server capacity 5/s 11/s 2.2x

The 2.2x improvement (vs 2.8x in the Playwright benchmark) is expected — the dashboard timing includes data generation overhead which is unchanged between branches. The pure HoloViews processing improvement is consistent.

Multi-session capacity

The server-side speedup directly translates to capacity for concurrent sessions:

Metric 1x1 1x6 4x6
Server capacity (main) 55/s 21/s 5.8/s
Server capacity (optimized) 101/s 47/s 16/s
Capacity improvement 1.8x 2.2x 2.8x

At 4x6, this means the server can feed ~3x more simultaneous streaming dashboards before becoming the bottleneck. Alternatively, for a single session, the server CPU is free ~64% more of the time for other work (callbacks, widget interactions, other sessions).

The changes

Twelve changes in HoloViews, three in param. All are independent and could be submitted as separate PRs.

HoloViews changes

1. Cache gen_types type tuples (+29/-4 in core/util/types.py)

The _GeneratorIsMeta metaclass re-evaluates its generator function on every isinstance() call. For hot paths like isfinite, this produced ~550k generator evaluations per 60 updates. Cache the resulting tuple after first evaluation.

  • Complexity: Low. Adds _cached_types attribute + cache invalidation function.
  • Confidence: High — the generated types only change if new libraries are imported post-startup, which is rare. Invalidation function provided for that case.

2. Disable pipeline bookkeeping during plot refresh (+5/-4 across 2 files)

PipelineMeta records every Dataset method call for reproducibility. During plot refresh these are read-only queries — no new Datasets are created. Wrap the refresh path in disable_pipeline().

  • Complexity: Trivial. Two-line change: move the disable check earlier in pipelined_fn and wrap refresh()_trigger_refresh().
  • Confidence: High — pipeline recording is only needed for user-facing .pipeline attribute, not during rendering.

3. Skip Bokeh property validation in _update_datasource (+14/-10 across 2 files)

Bokeh validates every element of every column when ColumnDataSource.data is assigned. For 6 curves x 1,000 points this is ~12k validation calls per update — all no-ops since HoloViews already ensures correct types. Wrap the assignment in Bokeh’s validate(False).

  • Complexity: Low. Uses Bokeh’s own public validate context manager.
  • Confidence: High — Bokeh’s validate(False) is the documented way to skip per-element checks. The important structural checks (column length equality, etc.) are Python assertions that run regardless.

4. Remove dead hasattr check triggering Bokeh’s difflib (-3 lines in plotting/bokeh/element.py)

hasattr(glyph, 'visible') always returns False (no Bokeh glyph has a visible property — only renderers do). But Bokeh’s __getattr__ calls difflib.get_close_matches() before raising AttributeError, adding ~1% overhead per update.

  • Complexity: Trivial. Pure dead code removal.
  • Confidence: Very high — visibility is already correctly controlled via the GlyphRenderer.

5. Short-circuit _update_labels when axis props unchanged (+7 lines in plotting/bokeh/element.py)

_update_labels calls recursive_model_update on every update, which traverses all Bokeh model properties via properties_with_values. Cache the computed axis props dict and skip when unchanged — the common case during streaming when labels don’t change.

  • Complexity: Low. Cache comparison before the expensive call.
  • Confidence: High — labels only change when axis titles/formatters change, not on data updates.

6. Short-circuit _update_ranges when computed ranges unchanged (+24 lines in plotting/bokeh/element.py)

_update_ranges recomputes full min/max over all data on every frame, then applies padding, option lookups, and Bokeh model updates. Cache the computed range values and skip the entire method when unchanged. This is common in streaming with fixed-range signals or when data happens to stay within previously computed bounds.

  • Complexity: Medium. Needs to bail out for categorical axes (factor arrays) and subcoordinate_y plots.
  • Confidence: Medium-high — the cache is conservative (any change triggers a full recompute). The bail-out cases need testing.

7. Skip param.update when plot options unchanged (+8/-2 in plotting/bokeh/element.py)

update_frame calls self.param.update(**plot_opts) on every frame. param.update iterates all ~80 parameters to build a restore dict, sets attributes, and checks watchers. Cache the plot options dict and skip when unchanged — the normal case during streaming when plot options are static.

  • Complexity: Low. Dict comparison before the expensive call, applied at both subplot and overlay levels.
  • Confidence: High — plot options only change when the user explicitly changes them, not on data updates.

8. Skip glyph visual property updates when style unchanged (+16/-0 in plotting/bokeh/element.py)

_update_glyphs iterates 5 glyph types (normal, selection, nonselection, hover, muted), filters properties, introspects Bokeh models, and calls glyph.update() — all on every frame. When visual properties haven’t changed (the common case during streaming), skip the glyph property loop entirely and only update the data source.

  • Complexity: Low. Extracts non-source properties for comparison, falls through on first frame and when styles change.
  • Confidence: High — data source updates are always applied; only the visual property application is skipped.

9. Skip _process_legend when overlay structure unchanged (+13 lines in plotting/bokeh/element.py)

_process_legend rebuilds legend items on every overlay update. Cache a tuple of (subplot keys, renderer visibility, legend config) and skip when unchanged — the normal case during streaming.

  • Complexity: Low. Tuple comparison against previous state.
  • Confidence: High — legend only needs updating when subplots are added/removed or legend settings change.

10. Cache plot properties, title, and grid in _update_plot (+22/-3 in plotting/bokeh/element.py)

_update_plot calls plot.update(), title.update(), and grid update() on every frame, each involving Bokeh model introspection. Cache the computed property dicts and skip updates when unchanged.

  • Complexity: Low. Three cache comparisons for properties that rarely change during streaming.
  • Confidence: High — these properties change only when axis configuration or plot styling changes.

11. Numeric fast path for max_range / dimension_range (+52/-5 in core/util/__init__.py, +134 test lines) — PR #6806

This one already has a PR open (merged now). max_range is called ~90 times per frame for a 6-curve overlay. Each call creates a numpy array from 1-3 tuples just to call nanmin/nanmax. Add a pure-Python fast path for the common numeric case (int/float/numpy scalars), avoiding array allocation entirely. Also add an early return in dimension_range when hard_range bounds are already finite.

  • Complexity: Medium. The fast path must preserve numpy scalar types and handle None/NaN/inf correctly. Comes with 38 dedicated tests.
  • Confidence: High — well tested, clear fallback to existing code for non-numeric types.

Param changes (35 lines changed in parameterized.py)

12a. Avoid full values() iteration in _update() (+4/-2)

_update() called self.values() to build a restore dict, iterating all ~80 parameters on HoloViews plot objects. The restore dict only needs entries for the kwargs being updated (typically 5-15). Replace with targeted getattr for just those keys.

  • Complexity: Trivial.
  • Confidence: High — the semantics are identical, just fewer iterations.

12b. Inline get_value_generator logic in values() (+19/-3)

values() called get_value_generator(name) per parameter, each call re-fetching the objects dict and creating a new Parameters accessor. Inline the logic so the objects dict is fetched once.

  • Complexity: Low-medium. The inlined code handles Composite, Dynamic, and plain parameters.
  • Confidence: Medium-high — follows the same logic, just avoids redundant lookups. Needs testing with Dynamic parameters and Composite parameters.

12c. Bypass __getattribute__ in Parameter.__get__ (+12/-9)

Parameter.__getattribute__ checks whether slot values are Undefined — only relevant for unbound parameters. __get__ is only invoked on bound parameters, so the check is always a no-op. Use object.__getattribute__ directly.

  • Complexity: Low.
  • Confidence: High — __get__ is a descriptor protocol method only called on class-bound parameters.

Understanding the E2E picture

The E2E benchmark reveals an important architectural insight: the HoloViews update pipeline has three independent stages, each with its own throughput limit:

  1. HoloViews processing — compute new plot state from data (optimized here)
  2. Bokeh Document sync — serialize changes, send over WebSocket
  3. Browser rendering — decode, update DOM/canvas, paint

These stages are pipelined — the server is free while the browser renders. When the server pushes faster than the browser can render, updates are coalesced: only the latest data is visible per frame. This means:

  • Single-session fps is clamped by the slowest stage (browser rendering for complex layouts). Server-side improvements don’t increase fps when the browser is the bottleneck.
  • Server capacity improvements benefit multi-session deployments. At 4 plots x 6 curves, server capacity goes from 5.8 to 16 pushes/sec — capacity for ~3x more simultaneous streaming clients.
  • For single-session use, the server CPU is freed up. The optimized server spends ~64% less time processing updates, leaving more headroom for callbacks, widget interactions, and other user-facing work.

Using pn.io.hold(doc) + doc.models.freeze() is essential for batching — without it, Bokeh recomputes its model graph on every property change, adding significant overhead (up to 37% more server time at 4x6).

Try it yourself

The changes are on two branches in my forks:

4 Likes

Definitely open to performance improvements, would prefer submitting them in small targeted PRs.

For 1, the suggestion about caching is likely to naive. As this is also ran doing import time of HoloViews, and if you import pandas afterwards it will not have it in the check. Not saying it shouldn’t be improved, just it is not as easy as described.

Hi!
Not entirely clear on what you are saying. I presume you refer to perf: Cache gen_types type tuples to avoid repeated generator evaluation · holoviz/holoviews@4cba774 · GitHub (item 1 of the HoloViews improvements). If I understood correctly the try/except should deal with the import order? But maybe you meant something else?

Curious if these optimizations have been submitted to HoloViews yet because I’m excited to see them :smiley:

Let us know if you need help with any.

1 Like

Didn’t get around to it yet. You could pip install from the branches in my forks (listed at the end of the initial post) to give it a try. I am curious how much it helps in other’s real apps!

1 Like