App load time is 7s with Panel servable(), how to start optimizing it?

After trying everything and the kitchen sink with no changes, I split the whole script into two:

  • One of them only deals with the waterpoints, everything else is removed.
  • The other one only deals with the spatial datasets, everything else is removed.

The only result of this that now I had two smaller script, but speedwise they simply divided the runtime between them almost 50-50.

Which is weird, as the waterpoints mainly use .apply() to be dynamic and displayed by hv.Points(), hd.datashade() and hd.aggregate(), while the spatial datasets are displayed by hd.regrid(hv.DynamicMap(get_image_func)) that use @pn.depends() for being dynamic, so without testing I would have assumed they would have nothing in common, and yet.

So next I started to hack everything away from the waterpoint section and voila, now I have a minimal example without any widgets whatsoever that runs in 5s on Colab and in 3.7s on my laptop, even though this is only a tiny chunk of the original code with 10k dummy Points.

https://colab.research.google.com/drive/1J0AJdRNqlk7Rj6BndLbUfLu0AdiFzbFF?usp=sharing

import random, numpy as np, pandas as pd, holoviews.operation.datashader as hd
import holoviews as hv, bokeh as bk, datashader as ds, panel as pn, param

pn.extension()
hv.extension('bokeh', logo=False)

number_of_waterpoints = 10000
wp_status_categories = ['F', 'R', 'N', 'U']
wp_category_color_dict = {'F': 'blue', 'R': 'red', 'N': 'yellow', 'U': 'gray'}

# creating the starting wp DataFrame
wp = pd.DataFrame(np.arange(number_of_waterpoints), columns=['id'])
wp['lat'] = np.random.random(number_of_waterpoints)*(-7.8) -9.37
wp['long'] = np.random.random(number_of_waterpoints)*(35.88 -32.7) +32.7
wp.loc[:, 'east'], wp.loc[:, 'north'] = ds.utils.lnglat_to_meters(wp.long, wp.lat)
wp['result'] = pd.Series(random.choices(wp_status_categories, k=wp.shape[0]))
wp['result'] = wp['result'].astype(pd.CategoricalDtype(wp_status_categories))

wp_hv_points = hv.Points(wp, ['east', 'north'])

# setting up streams to use in determining zoom
stream_rangexy = hv.streams.RangeXY(source=wp_hv_points)

# creating shade to display
wp_aggregated = hd.datashade(wp_hv_points,
    aggregator=ds.count_cat('result'), color_key=wp_category_color_dict,
    min_alpha=0.5, width=100, height=100, streams=[stream_rangexy])

# creating aggregate to display Count in Tooltip
wp_hover_for_aggregated = hv.util.Dynamic(
    hd.aggregate(wp_hv_points, width=100, height=100, streams=[stream_rangexy]),
    operation=hv.Image).opts(tools=['hover'], alpha=0)
   
combined = (hv.element.tiles.OSM() * wp_aggregated * wp_hover_for_aggregated) 
#  * wp_hv_points

# New cell:
%%prun -l 100
hv.extension('bokeh', logo=False)
display(combined)

I tried, and it doesnā€™t even matter if I include the wp_hv_points itself or not. If I comment out wp_aggregated or wp_hover_for_aggregated, the runtime is between 3-4s.

%%prun:
 4053728 function calls (3906057 primitive calls) in 5.244 seconds

   Ordered by: internal time
   List reduced from 5031 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 5399    0.724    0.000    0.727    0.000 ffi.py:111(__call__)
54784    0.349    0.000    1.018    0.000 colors.py:193(_to_rgba_no_colorcycle)
54784    0.175    0.000    1.226    0.000 colors.py:157(to_rgba)
   273920    0.171    0.000    0.171    0.000 colors.py:277(<genexpr>)
55072    0.169    0.000    0.169    0.000 {method 'astype' of 'numpy.ndarray' objects}
  611    0.149    0.000    0.170    0.000 builder.py:854(call)
607124/606418    0.142    0.000    0.170    0.000 {built-in method builtins.isinstance}
57212/57038    0.127    0.000    0.130    0.000 {built-in method numpy.array}
  214    0.104    0.000    1.387    0.006 colors.py:782(from_list)
37638/7330    0.085    0.000    0.120    0.000 ir.py:313(_rec_list_vars)
59484/58528    0.080    0.000    0.137    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
48184/24167    0.069    0.000    0.280    0.000 {method 'format' of 'str' objects}
62431/62171    0.064    0.000    0.287    0.000 {built-in method builtins.any}
56712    0.043    0.000    0.126    0.000 <__array_function__ internals>:2(can_cast)
26326/11895    0.043    0.000    0.264    0.000 _utils.py:44(__str__)
   272079    0.040    0.000    0.040    0.000 {method 'append' of 'list' objects}
74507/62353    0.037    0.000    0.190    0.000 {built-in method builtins.getattr}
19889/19883    0.037    0.000    0.072    0.000 _utils.py:54(get_reference)
   30    0.036    0.001    0.038    0.001 encoder.py:204(iterencode)
 4560    0.035    0.000    0.084    0.000 weakref.py:395(__getitem__)
10842    0.029    0.000    0.093    0.000 values.py:212(__init__)
  352    0.028    0.000    0.032    0.000 analysis.py:91(liveness)
 2009    0.028    0.000    0.044    0.000 instructions.py:638(descr)
 7601    0.027    0.000    0.095    0.000 instructions.py:13(__init__)
 2126    0.026    0.000    0.026    0.000 {method 'index' of 'tuple' objects}
52370/52310    0.026    0.000    0.035    0.000 {built-in method builtins.hasattr}
 2248/936    0.022    0.000    0.161    0.000 context.py:180(get_meminfos)
152578/149103    0.022    0.000    0.024    0.000 {built-in method builtins.len}
18297/18202    0.022    0.000    0.056    0.000 {method 'join' of 'str' objects}
3302/2595    0.021    0.000    0.047    0.000 contextlib.py:116(__exit__)
 7860    0.021    0.000    0.231    0.000 values.py:219(_to_string)
29491    0.019    0.000    0.024    0.000 parameterized.py:837(__get__)
 5207    0.019    0.000    0.034    0.000 parameterized.py:859(__set__)
54784    0.019    0.000    0.033    0.000 colors.py:123(_is_nth_color)
 1090    0.019    0.000    0.111    0.000 parameterized.py:1277(_setup_params)
 3112    0.019    0.000    0.019    0.000 {method 'reduce' of 'numpy.ufunc' objects}
  112    0.017    0.000    0.024    0.000 __init__.py:1304(_clear_cache)
 7880    0.017    0.000    0.029    0.000 values.py:239(_get_reference)
9516/9201    0.016    0.000    0.021    0.000 abstract.py:121(__eq__)
10811    0.016    0.000    0.036    0.000 _utils.py:24(deduplicate)
 7588    0.016    0.000    0.109    0.000 bases.py:328(prepare_value)
9123/2970    0.016    0.000    0.041    0.000 copy.py:132(deepcopy)
11263    0.015    0.000    0.063    0.000 values.py:232(_set_name)
14431/11722    0.015    0.000    0.023    0.000 {built-in method builtins.hash}
   50    0.015    0.000    0.022    0.000 ir.py:1249(dump)
   79    0.015    0.000    0.027    0.000 __init__.py:69(<listcomp>)
17098    0.014    0.000    0.018    0.000 {built-in method _abc._abc_instancecheck}
25431/3789    0.014    0.000    0.047    0.000 bases.py:304(is_valid)
79180    0.014    0.000    0.018    0.000 {method 'get' of 'dict' objects}
 6619    0.014    0.000    0.025    0.000 inspect.py:2467(__init__)
6772/6456    0.013    0.000    0.163    0.000 descriptors.py:704(_get_default)
 7601    0.013    0.000    0.016    0.000 builder.py:338(_insert)
14982    0.013    0.000    0.017    0.000 parameterized.py:2544(param)
   72    0.013    0.000    0.013    0.000 {built-in method posix.stat}
 3302    0.013    0.000    0.014    0.000 contextlib.py:81(__init__)
20336    0.013    0.000    0.013    0.000 _utils.py:13(is_used)
 1368    0.013    0.000    0.032    0.000 util.py:870(isfinite)
  110    0.012    0.000    0.096    0.001 analysis.py:23(compute_use_defs)
20224    0.012    0.000    0.012    0.000 __init__.py:65(rgb_to_hex)
11129/8443    0.012    0.000    0.043    0.000 abstract.py:118(__hash__)
 6578    0.012    0.000    0.013    0.000 analysis.py:69(<genexpr>)
9232/7606    0.012    0.000    0.023    0.000 {built-in method builtins.sorted}
 2009    0.011    0.000    0.032    0.000 instructions.py:623(__init__)
  672    0.011    0.000    0.024    0.000 instructions.py:166(descr)
 1806    0.011    0.000    0.011    0.000 {method 'encode' of 'str' objects}
  224    0.010    0.000    0.026    0.000 function_base.py:23(linspace)
 1090    0.010    0.000    0.155    0.000 parameterized.py:2506(__init__)
9941/9625    0.010    0.000    0.177    0.000 descriptors.py:676(_get)
 1436    0.010    0.000    0.010    0.000 {method 'decode' of 'bytes' objects}
17637    0.010    0.000    0.010    0.000 abstract.py:96(key)
10836    0.010    0.000    0.010    0.000 {method 'match' of 're.Pattern' objects}
6842/6526    0.010    0.000    0.134    0.000 bases.py:182(themed_default)
11263    0.010    0.000    0.048    0.000 _utils.py:16(register)
56712    0.010    0.000    0.010    0.000 multiarray.py:468(can_cast)
22002    0.010    0.000    0.029    0.000 container.py:77(<genexpr>)
9076/7550    0.009    0.000    0.015    0.000 model.py:824(_visit_value_and_its_immediate_references)
  177    0.009    0.000    0.011    0.000 inspect.py:2750(__init__)
  611    0.008    0.000    0.021    0.000 instructions.py:64(__init__)
  266    0.008    0.000    0.011    0.000 values.py:467(__init__)
 2036    0.008    0.000    0.034    0.000 models.py:659(get_field_position)
6772/6456    0.008    0.000    0.144    0.000 descriptors.py:584(instance_default)
34527    0.008    0.000    0.014    0.000 {built-in method builtins.issubclass}
 1283    0.008    0.000    0.013    0.000 instructions.py:398(descr)
 4100    0.008    0.000    0.016    0.000 parameterized.py:2278(get_param_descriptor)
 7472    0.008    0.000    0.019    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
 3160    0.008    0.000    0.010    0.000 utils.py:377(stream_list)
 8111    0.007    0.000    0.020    0.000 copy.py:66(copy)
49373    0.007    0.000    0.010    0.000 parameterized.py:1229(__iter__)
  110    0.007    0.000    0.017    0.000 analysis.py:118(compute_dead_maps)
  148    0.007    0.000    0.034    0.000 npydecl.py:96(generic)
 3460    0.007    0.000    0.037    0.000 container.py:178(validate)
 6663    0.007    0.000    0.009    0.000 enum.py:289(__call__)
  150    0.007    0.000    0.101    0.001 functions.py:280(get_call_type)
2    0.007    0.004    0.007    0.004 {method 'read' of '_io.FileIO' objects}
5477/5207    0.007    0.000    0.045    0.000 parameterized.py:313(_f)
1    0.007    0.007    1.430    1.430 __init__.py:41(<module>)
10935/9477    0.007    0.000    0.110    0.000 {built-in method builtins.next}
   40    0.007    0.000    0.039    0.001 postproc.py:175(_patch_var_dels)
  628/100    0.007    0.000    0.008    0.000 packer.py:175(rec)
  426    0.007    0.000    0.016    0.000 dataset.py:1366(_construct_dataarray)
%%prun -s cumulative:
3777019 function calls (3654897 primitive calls) in 5.450 seconds

   Ordered by: cumulative time
   List reduced from 4903 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      7/1    0.000    0.000    5.451    5.451 {built-in method builtins.exec}
        1    0.000    0.000    5.451    5.451 <string>:2(<module>)
      2/1    0.000    0.000    5.285    5.285 display.py:141(display)
        1    0.000    0.000    5.280    5.280 formatters.py:91(format)
       12    0.000    0.000    5.279    0.440 formatters.py:213(catch_format_error)
        1    0.000    0.000    5.278    5.278 <decorator-gen-5>:1(__call__)
        1    0.000    0.000    5.278    5.278 formatters.py:946(__call__)
        1    0.000    0.000    5.278    5.278 dimension.py:1310(_repr_mimebundle_)
        1    0.000    0.000    5.278    5.278 options.py:1389(render)
        1    0.000    0.000    5.278    5.278 display_hooks.py:274(pprint_display)
        1    0.000    0.000    5.278    5.278 display_hooks.py:235(display)
        3    0.000    0.000    5.278    1.759 display_hooks.py:138(wrapped)
        1    0.000    0.000    5.278    5.278 display_hooks.py:195(map_display)
        1    0.000    0.000    5.278    5.278 display_hooks.py:52(render)
        1    0.000    0.000    5.278    5.278 renderer.py:379(components)
        1    0.000    0.000    5.213    5.213 viewable.py:422(_render_model)
        1    0.000    0.000    5.194    5.194 viewable.py:466(get_root)
        1    0.000    0.000    5.185    5.185 base.py:109(_get_model)
        1    0.000    0.000    5.184    5.184 base.py:82(_get_objects)
        1    0.000    0.000    5.172    5.172 holoviews.py:223(_get_model)
        1    0.000    0.000    5.157    5.157 holoviews.py:273(_render)
        1    0.000    0.000    5.157    5.157 renderer.py:66(get_plot)
        1    0.000    0.000    5.157    5.157 renderer.py:206(get_plot)
     13/2    0.000    0.000    2.827    1.414 spaces.py:1280(__getitem__)
     13/2    0.000    0.000    2.826    1.413 spaces.py:1087(_execute_callback)
     13/2    0.000    0.000    2.826    1.413 spaces.py:667(__call__)
      9/3    0.000    0.000    2.821    0.940 __init__.py:1041(dynamic_operation)
        2    0.000    0.000    2.818    1.409 spaces.py:206(dynamic_mul)
        9    0.000    0.000    2.814    0.313 __init__.py:1033(apply)
        9    0.000    0.000    2.813    0.313 __init__.py:1012(_process)
        4    0.000    0.000    2.789    0.697 operation.py:176(process_element)
      8/6    0.000    0.000    2.789    0.465 operation.py:126(_apply)
        7    0.000    0.000    2.687    0.384 util.py:242(initialize_dynamic)
        4    0.001    0.000    2.583    0.646 datashader.py:432(_process)
        4    0.000    0.000    2.494    0.624 core.py:172(points)
        4    0.001    0.000    2.494    0.624 core.py:1161(bypixel)
      8/4    0.000    0.000    2.480    0.620 utils.py:102(__call__)
        4    0.000    0.000    2.480    0.620 pandas.py:15(pandas_pipeline)
        4    0.000    0.000    2.480    0.620 pandas.py:23(default)
        2    0.000    0.000    2.414    1.207 plot.py:980(update)
        4    0.000    0.000    2.382    0.595 dispatcher.py:337(_compile_for_args)
    244/4    0.001    0.000    2.380    0.595 compiler_lock.py:29(_acquire_compile_lock)
     24/4    0.001    0.000    2.380    0.595 dispatcher.py:795(compile)
     10/4    0.000    0.000    2.379    0.595 dispatcher.py:77(compile)
     10/4    0.000    0.000    2.379    0.595 dispatcher.py:84(_compile_cached)
     10/4    0.001    0.000    2.379    0.595 dispatcher.py:99(_compile_core)
     10/4    0.000    0.000    2.379    0.595 compiler.py:601(compile_extra)
     10/4    0.000    0.000    2.367    0.592 compiler.py:350(compile_extra)
     10/4    0.000    0.000    2.362    0.591 compiler.py:420(_compile_bytecode)
     10/4    0.000    0.000    2.362    0.590 compiler.py:380(_compile_core)
     10/4    0.001    0.000    2.359    0.590 compiler_machinery.py:318(run)
   220/88    0.006    0.000    2.357    0.027 compiler_machinery.py:259(_runPass)
  660/264    0.001    0.000    2.351    0.009 compiler_machinery.py:263(check)
        1    0.000    0.000    2.172    2.172 element.py:2316(initialize_plot)
        4    0.000    0.000    2.139    0.535 points.py:187(extend)
        2    0.000    0.000    2.118    1.059 overlay.py:30(dynamic_mul)
        2    0.000    0.000    2.106    1.053 datashader.py:1497(_process)
  136/114    0.003    0.000    2.039    0.018 __init__.py:196(pipelined_fn)
        2    0.002    0.001    2.034    1.017 datashader.py:1454(_process)
        3    0.000    0.000    2.028    0.676 element.py:1378(initialize_plot)
        3    0.000    0.000    2.003    0.668 element.py:1340(_init_glyphs)
        2    0.000    0.000    1.971    0.985 raster.py:83(get_data)
        2    0.000    0.000    1.966    0.983 element.py:1826(_get_colormapper)
        7    0.000    0.000    1.961    0.280 <frozen importlib._bootstrap>:978(_find_and_load)
        6    0.000    0.000    1.961    0.327 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
        2    0.000    0.000    1.959    0.980 util.py:885(process_cmap)
        6    0.000    0.000    1.959    0.327 util.py:666(_list_cmaps)
        2    0.000    0.000    1.959    0.979 <frozen importlib._bootstrap>:663(_load_unlocked)
        2    0.000    0.000    1.959    0.979 <frozen importlib._bootstrap_external>:722(exec_module)
        2    0.000    0.000    1.956    0.978 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.009    0.009    1.956    1.956 __init__.py:41(<module>)
    79/40    0.001    0.000    1.954    0.049 dimension.py:677(map)
       66    0.000    0.000    1.924    0.029 __init__.py:1221(map)
      214    0.003    0.000    1.918    0.009 __init__.py:72(mpl_cm)
      6/4    0.000    0.000    1.914    0.478 operation.py:197(__call__)
      214    0.122    0.001    1.908    0.009 colors.py:782(from_list)
       10    0.001    0.000    1.769    0.177 typed_passes.py:430(run_pass)
       10    0.001    0.000    1.765    0.176 typed_passes.py:347(run_pass)
    54784    0.211    0.000    1.714    0.000 colors.py:157(to_rgba)
     10/4    0.000    0.000    1.695    0.424 typed_passes.py:85(run_pass)
     10/4    0.000    0.000    1.694    0.424 typed_passes.py:50(type_inference_stage)
     10/4    0.000    0.000    1.683    0.421 typeinfer.py:1052(propagate)
     20/8    0.002    0.000    1.683    0.210 typeinfer.py:141(propagate)
   138/16    0.002    0.000    1.670    0.104 typeinfer.py:568(resolve)
     62/8    0.001    0.000    1.670    0.209 typeinfer.py:558(__call__)
   142/20    0.000    0.000    1.670    0.083 context.py:187(resolve_function_type)
   138/16    0.000    0.000    1.669    0.104 typeinfer.py:1498(resolve_call)
   142/20    0.001    0.000    1.669    0.083 context.py:231(_resolve_user_function_type)
     20/4    0.000    0.000    1.668    0.417 functions.py:504(get_call_type)
     20/4    0.000    0.000    1.668    0.417 dispatcher.py:297(get_call_template)
    54784    0.581    0.000    1.463    0.000 colors.py:193(_to_rgba_no_colorcycle)
      9/3    0.000    0.000    0.799    0.266 __init__.py:1026(resolve)
     5399    0.714    0.000    0.718    0.000 ffi.py:111(__call__)
       10    0.001    0.000    0.692    0.069 lowering.py:131(lower)
       10    0.000    0.000    0.603    0.060 cpu.py:195(get_executable)
       20    0.000    0.000    0.601    0.030 codegen.py:492(get_pointer_to_function)
    58/40    0.000    0.000    0.600    0.015 codegen.py:123(_ensure_finalized)
       10    0.000    0.000    0.600    0.060 codegen.py:217(finalize)
       10    0.000    0.000    0.462    0.046 lowering.py:182(lower_normal_function)
       10    0.000    0.000    0.419    0.042 lowering.py:196(lower_function_body)

Using holoviews@optimize_options or 1.14.3 made no difference in these results.

Based on this there are 3 possibilities:

  1. I use both hd.datashade() and hd.aggregate() terribly wrong, in that case, I hope my error will be easily identifiable now.
  2. There is a bug (in Datashader?) that causes this issue with both functions.
  3. This is normal, this is how long these functions should run for the first time for 10k points.

I assume itā€™s not #3, as that would be weird. I hope for #1, as the quickest way to resolve this, however after all this experimentation I wouldnā€™t be surprised if it would be #2. The thing is, that this was only the waterpoint half of the code. I didnā€™t have the time to similarly reduce the other half to a minimal example (the current version available here on Colab), but that doesnā€™t use either of these function, only hd.regrid(), and that runs similarly slowly. So if there would be an issue with some core functionality all three of these functions rely on, that would explain everything. But at the same time, it is entirely possible that itā€™s just an accident that these two newly separated scripts run similarly slow and they have completely different problems.

Based on what I learned so far from the waterpoints, I decided to quickly create a minimal example for the spatial datasets with no widgets and no @pn.depends(), just the hd.regrid(DynamicMap()) parts. And the same thing happened, the script runs for 6.6s on Colab (4.3s locally).

The minimal script:
https://colab.research.google.com/drive/1C1xQOnKnV5MgOyJgIDTMuoVgo2jAG_7n?usp=sharing

import random, numpy as np, holoviews.operation.datashader as hd
import holoviews as hv, bokeh as bk, datashader as ds, panel as pn, param
pn.extension()
hv.extension('bokeh', logo=False)
left, right, top, bottom = 3641158, 3994334, -1935463, -1047720

def get_image_01():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_02():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_03():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_04():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_05():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_06():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_07():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))

def get_image_08():
    return hv.Image(np.ones((100, 250)), bounds=(left,bottom,right,top))


combined = (
    hv.element.tiles.OSM()
    * hd.regrid(hv.DynamicMap(get_image_01))
    * hd.regrid(hv.DynamicMap(get_image_02))
    * hd.regrid(hv.DynamicMap(get_image_03))
    * hd.regrid(hv.DynamicMap(get_image_04))
    * hd.regrid(hv.DynamicMap(get_image_05))
    * hd.regrid(hv.DynamicMap(get_image_06))
    * hd.regrid(hv.DynamicMap(get_image_07))
    * hd.regrid(hv.DynamicMap(get_image_08))  
)

# New cell:
%%prun -l 100
hv.extension('bokeh', logo=False)
display(combined)
%%prun:
5428532 function calls (5331887 primitive calls) in 6.664 seconds

   Ordered by: internal time
   List reduced from 3670 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    54784    0.659    0.000    1.446    0.000 colors.py:193(_to_rgba_no_colorcycle)
    55350    0.212    0.000    0.212    0.000 {method 'astype' of 'numpy.ndarray' objects}
    54784    0.194    0.000    1.675    0.000 colors.py:157(to_rgba)
   273920    0.193    0.000    0.193    0.000 colors.py:277(<genexpr>)
59769/59513    0.169    0.000    0.171    0.000 {built-in method numpy.array}
573435/572779    0.155    0.000    0.180    0.000 {built-in method builtins.isinstance}
      214    0.117    0.001    1.861    0.009 colors.py:782(from_list)
   122051    0.114    0.000    0.139    0.000 parameterized.py:837(__get__)
    19860    0.111    0.000    0.177    0.000 parameterized.py:859(__set__)
62536/59598    0.109    0.000    0.339    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     5357    0.108    0.000    0.593    0.000 parameterized.py:1277(_setup_params)
     8670    0.098    0.000    0.098    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      432    0.095    0.000    0.096    0.000 ffi.py:111(__call__)
215776/176135    0.090    0.000    0.595    0.000 {built-in method builtins.getattr}
    82853    0.086    0.000    0.116    0.000 parameterized.py:2544(param)
76634/76271    0.082    0.000    0.423    0.000 {built-in method builtins.any}
     5357    0.075    0.000    0.935    0.000 parameterized.py:2506(__init__)
     6038    0.074    0.000    0.184    0.000 util.py:870(isfinite)
       98    0.070    0.001    0.072    0.001 encoder.py:204(iterencode)
136627/136519    0.066    0.000    0.073    0.000 {built-in method builtins.hasattr}
17469/16650    0.056    0.000    0.117    0.000 tree.py:216(__setattr__)
16154/10797    0.056    0.000    0.793    0.000 parameterized.py:1060(override_initialization)
    23778    0.054    0.000    0.269    0.000 bases.py:328(prepare_value)
   262387    0.050    0.000    0.050    0.000 {method 'get' of 'dict' objects}
    54784    0.047    0.000    0.144    0.000 <__array_function__ internals>:2(can_cast)
17689/13682    0.047    0.000    0.114    0.000 copy.py:132(deepcopy)
21750/21077    0.044    0.000    0.513    0.000 descriptors.py:704(_get_default)
      926    0.043    0.000    0.048    0.000 parameterized.py:188(get_all_slots)
     4816    0.041    0.000    0.063    0.000 options.py:745(<genexpr>)
    20007    0.040    0.000    0.055    0.000 parameterized.py:1480(objects)
31084/28346    0.039    0.000    0.064    0.000 model.py:824(_visit_value_and_its_immediate_references)
    13682    0.039    0.000    0.160    0.000 parameterized.py:1346(_instantiate_param)
   114254    0.039    0.000    0.040    0.000 {built-in method builtins.issubclass}
   258677    0.037    0.000    0.037    0.000 {method 'append' of 'list' objects}
34264/33591    0.037    0.000    0.562    0.000 descriptors.py:676(_get)
54357/54282    0.036    0.000    0.370    0.000 {built-in method builtins.setattr}
20116/19860    0.034    0.000    0.224    0.000 parameterized.py:313(_f)
    13631    0.033    0.000    0.119    0.000 parameterized.py:1745(get_value_generator)
22065/21392    0.032    0.000    0.420    0.000 bases.py:182(themed_default)
     5357    0.030    0.000    0.141    0.000 parameterized.py:1271(_generate_name)
    82853    0.029    0.000    0.029    0.000 parameterized.py:1142(__init__)
      590    0.029    0.000    0.042    0.000 instructions.py:13(__init__)
    35388    0.029    0.000    0.041    0.000 util.py:370(tree_attribute)
    14339    0.028    0.000    0.062    0.000 parameterized.py:2278(get_param_descriptor)
     3453    0.027    0.000    0.184    0.000 parameterized.py:1694(get_param_values)
     1122    0.027    0.000    0.565    0.001 options.py:82(lookup_options)
21750/21077    0.027    0.000    0.454    0.000 descriptors.py:584(instance_default)
160743/155006    0.027    0.000    0.033    0.000 {built-in method builtins.len}
     1780    0.026    0.000    0.228    0.000 util.py:961(max_range)
    25979    0.026    0.000    0.032    0.000 parameterized.py:160(classlist)
    71665    0.025    0.000    0.025    0.000 {method 'items' of 'dict' objects}
     1362    0.025    0.000    0.061    0.000 dataset.py:1366(_construct_dataarray)
    10549    0.025    0.000    0.039    0.000 container.py:178(validate)
  682/594    0.024    0.000    1.085    0.002 __init__.py:196(pipelined_fn)
      566    0.023    0.000    0.068    0.000 function_base.py:23(linspace)
      544    0.023    0.000    0.023    0.000 boundingregion.py:298(lbrt)
      120    0.023    0.000    0.023    0.000 {built-in method bottleneck.reduce.nanmin}
     4866    0.023    0.000    0.594    0.000 model.py:808(_visit_immediate_value_references)
   131598    0.023    0.000    0.023    0.000 {method 'endswith' of 'str' objects}
     5302    0.022    0.000    0.039    0.000 functools.py:37(update_wrapper)
     5302    0.022    0.000    0.065    0.000 parameterized.py:273(__get__)
    22156    0.022    0.000    0.103    0.000 copy.py:66(copy)
    78851    0.021    0.000    0.027    0.000 parameterized.py:1229(__iter__)
    54784    0.021    0.000    0.035    0.000 colors.py:123(_is_nth_color)
13164/12819    0.020    0.000    0.172    0.000 has_props.py:273(__setattr__)
12383/12252    0.020    0.000    0.088    0.000 {built-in method builtins.sorted}
2408/1204    0.020    0.000    0.298    0.000 options.py:772(options)
5563/5272    0.020    0.000    0.035    0.000 copy.py:211(_deepcopy_list)
    12536    0.020    0.000    0.046    0.000 util.py:733(__call__)
      873    0.019    0.000    0.019    0.000 util.py:1120(int_to_roman)
      112    0.019    0.000    0.026    0.000 __init__.py:1304(_clear_cache)
22065/21392    0.018    0.000    0.163    0.000 bases.py:161(_copy_default)
38049/36753    0.018    0.000    0.557    0.000 descriptors.py:464(__get__)
     2136    0.018    0.000    0.386    0.000 options.py:466(__init__)
     6876    0.017    0.000    0.024    0.000 numeric.py:1816(isscalar)
       31    0.017    0.001    0.661    0.021 plot.py:717(_compute_group_range)
    25861    0.016    0.000    0.017    0.000 has_props.py:228(accumulate_dict_from_superclasses)
     9034    0.016    0.000    0.018    0.000 copy.py:253(_keep_alive)
     1899    0.016    0.000    0.034    0.000 dimension.py:901(dimensions)
     1776    0.015    0.000    0.077    0.000 nanfunctions.py:228(nanmin)
       79    0.015    0.000    0.028    0.000 __init__.py:69(<listcomp>)
4342/4323    0.015    0.000    0.030    0.000 {method 'join' of 'str' objects}
      656    0.015    0.000    0.284    0.000 util.py:1039(dimension_range)
    41510    0.015    0.000    0.015    0.000 util.py:686(<genexpr>)
     3451    0.015    0.000    0.028    0.000 version.py:307(parse)
       20    0.014    0.001    0.014    0.001 socket.py:438(send)
    16341    0.014    0.000    0.014    0.000 wrappers.py:138(__init__)
    45183    0.014    0.000    0.027    0.000 has_props.py:664(themed_values)
       19    0.013    0.001    0.013    0.001 templates.py:179(signature)
     6151    0.013    0.000    0.025    0.000 inspect.py:2467(__init__)
     6038    0.013    0.000    0.015    0.000 util.py:1509(is_dask_array)
      912    0.013    0.000    0.381    0.000 dimension.py:491(__init__)
    20224    0.013    0.000    0.013    0.000 __init__.py:65(rgb_to_hex)
 2100/931    0.013    0.000    0.025    0.000 _utils.py:44(__str__)
     4151    0.013    0.000    0.048    0.000 util.py:677(allowable)
     1204    0.013    0.000    0.101    0.000 options.py:733(find)
     1776    0.013    0.000    0.043    0.000 nanfunctions.py:343(nanmax)
     6369    0.013    0.000    0.065    0.000 container.py:74(validate)
       53    0.012    0.000    0.615    0.012 model.py:55(collect_filtered_models)
    14559    0.012    0.000    0.021    0.000 bases.py:502(validate)
%%prun -s cumulative:
5428885 function calls (5332236 primitive calls) in 6.073 seconds

   Ordered by: cumulative time
   List reduced from 3666 to 100 due to restriction <100>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      4/1    0.000    0.000    6.073    6.073 {built-in method builtins.exec}
        1    0.000    0.000    6.073    6.073 <string>:2(<module>)
      2/1    0.000    0.000    5.889    5.889 display.py:141(display)
        1    0.000    0.000    5.874    5.874 formatters.py:91(format)
       12    0.000    0.000    5.873    0.489 formatters.py:213(catch_format_error)
        1    0.000    0.000    5.871    5.871 <decorator-gen-5>:1(__call__)
        1    0.000    0.000    5.871    5.871 formatters.py:946(__call__)
        1    0.000    0.000    5.871    5.871 dimension.py:1310(_repr_mimebundle_)
        1    0.000    0.000    5.871    5.871 options.py:1389(render)
        1    0.000    0.000    5.871    5.871 display_hooks.py:274(pprint_display)
        1    0.000    0.000    5.871    5.871 display_hooks.py:235(display)
        3    0.000    0.000    5.871    1.957 display_hooks.py:138(wrapped)
        1    0.000    0.000    5.871    5.871 display_hooks.py:195(map_display)
        1    0.000    0.000    5.871    5.871 display_hooks.py:52(render)
        1    0.000    0.000    5.871    5.871 renderer.py:379(components)
        1    0.000    0.000    5.741    5.741 viewable.py:422(_render_model)
        1    0.000    0.000    5.708    5.708 viewable.py:466(get_root)
        1    0.000    0.000    5.690    5.690 base.py:109(_get_model)
        1    0.000    0.000    5.689    5.689 base.py:82(_get_objects)
        1    0.000    0.000    5.686    5.686 holoviews.py:223(_get_model)
        1    0.000    0.000    5.659    5.659 holoviews.py:273(_render)
        1    0.000    0.000    5.658    5.658 renderer.py:66(get_plot)
        1    0.000    0.000    5.658    5.658 renderer.py:206(get_plot)
        9    0.000    0.000    4.131    0.459 plot.py:980(update)
    168/9    0.004    0.000    2.239    0.249 spaces.py:1280(__getitem__)
        8    0.000    0.000    2.233    0.279 streams.py:146(trigger)
        8    0.000    0.000    2.231    0.279 plot.py:208(refresh)
        8    0.000    0.000    2.217    0.277 plot.py:252(_trigger_refresh)
        8    0.000    0.000    2.216    0.277 plot.py:437(__getitem__)
        8    0.002    0.000    2.216    0.277 element.py:2385(update_frame)
    168/9    0.004    0.000    2.210    0.246 spaces.py:1087(_execute_callback)
    168/9    0.003    0.000    2.209    0.245 spaces.py:667(__call__)
    24/16    0.000    0.000    2.106    0.132 __init__.py:1041(dynamic_operation)
        1    0.000    0.000    1.914    1.914 element.py:2316(initialize_plot)
     63/9    0.001    0.000    1.831    0.203 spaces.py:206(dynamic_mul)
       24    0.001    0.000    1.660    0.069 __init__.py:1033(apply)
       24    0.000    0.000    1.660    0.069 __init__.py:1012(_process)
        9    0.001    0.000    1.635    0.182 element.py:1378(initialize_plot)
        9    0.001    0.000    1.557    0.173 element.py:1340(_init_glyphs)
       16    0.001    0.000    1.525    0.095 raster.py:83(get_data)
       16    0.001    0.000    1.487    0.093 element.py:1826(_get_colormapper)
       16    0.000    0.000    1.431    0.089 util.py:885(process_cmap)
       48    0.001    0.000    1.430    0.030 util.py:666(_list_cmaps)
        2    0.000    0.000    1.424    0.712 <frozen importlib._bootstrap>:978(_find_and_load)
        1    0.000    0.000    1.424    1.424 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
        1    0.000    0.000    1.424    1.424 <frozen importlib._bootstrap>:663(_load_unlocked)
        1    0.000    0.000    1.424    1.424 <frozen importlib._bootstrap_external>:722(exec_module)
        1    0.000    0.000    1.421    1.421 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        1    0.010    0.010    1.421    1.421 __init__.py:41(<module>)
      214    0.003    0.000    1.379    0.006 __init__.py:72(mpl_cm)
      214    0.104    0.000    1.371    0.006 colors.py:782(from_list)
       19    0.000    0.000    1.359    0.072 util.py:242(initialize_dynamic)
       88    0.002    0.000    1.297    0.015 operation.py:126(_apply)
       16    0.000    0.000    1.296    0.081 operation.py:176(process_element)
    54784    0.181    0.000    1.202    0.000 colors.py:157(to_rgba)
       16    0.003    0.000    1.142    0.071 datashader.py:885(_process)
    54784    0.343    0.000    0.988    0.000 colors.py:193(_to_rgba_no_colorcycle)
  682/594    0.023    0.000    0.975    0.002 __init__.py:196(pipelined_fn)
     5357    0.081    0.000    0.899    0.000 parameterized.py:2506(__init__)
       47    0.000    0.000    0.884    0.019 plot.py:1265(_get_frame)
       18    0.000    0.000    0.883    0.049 util.py:255(get_plot_frame)
16154/10797    0.031    0.000    0.762    0.000 parameterized.py:1060(override_initialization)
    24/16    0.000    0.000    0.622    0.039 __init__.py:1026(resolve)
       53    0.000    0.000    0.603    0.011 model.py:98(collect_models)
       53    0.012    0.000    0.602    0.011 model.py:55(collect_filtered_models)
       52    0.001    0.000    0.590    0.011 model.py:554(references)
215776/176135    0.087    0.000    0.582    0.000 {built-in method builtins.getattr}
     4866    0.023    0.000    0.580    0.000 model.py:808(_visit_immediate_value_references)
34264/33591    0.034    0.000    0.548    0.000 descriptors.py:676(_get)
38049/36753    0.017    0.000    0.542    0.000 descriptors.py:464(__get__)
     5357    0.089    0.000    0.533    0.000 parameterized.py:1277(_setup_params)
        9    0.000    0.000    0.530    0.059 element.py:749(_update_plot)
       45    0.001    0.000    0.524    0.012 plots.py:90(select)
        9    0.000    0.000    0.515    0.057 overlay.py:30(dynamic_mul)
       32    0.001    0.000    0.513    0.016 __init__.py:1179(clone)
21750/21077    0.041    0.000    0.503    0.000 descriptors.py:704(_get_default)
        9    0.001    0.000    0.493    0.055 element.py:803(_update_ranges)
       16    0.002    0.000    0.489    0.031 core.py:896(raster)
        9    0.000    0.000    0.476    0.053 element.py:759(_update_labels)
       16    0.000    0.000    0.462    0.029 raster.py:415(clone)
       10    0.001    0.000    0.449    0.045 plot.py:1838(get_extents)
21750/21077    0.027    0.000    0.448    0.000 descriptors.py:584(instance_default)
       20    0.001    0.000    0.441    0.022 plot.py:598(compute_ranges)
       80    0.008    0.000    0.440    0.006 __init__.py:304(__init__)
  819/387    0.004    0.000    0.439    0.001 overlay.py:139(__init__)
  819/387    0.008    0.000    0.437    0.001 dimension.py:1350(__init__)
        9    0.000    0.000    0.429    0.048 element.py:763(<dictcomp>)
       18    0.002    0.000    0.429    0.024 element.py:647(_axis_properties)
     1122    0.002    0.000    0.424    0.000 plot.py:290(lookup_options)
      125    0.002    0.000    0.423    0.003 dimension.py:539(clone)
     1122    0.013    0.000    0.422    0.000 options.py:82(lookup_options)
       31    0.009    0.000    0.419    0.014 plot.py:717(_compute_group_range)
       32    0.002    0.000    0.417    0.013 raster.py:266(__init__)
22065/21392    0.031    0.000    0.414    0.000 bases.py:182(themed_default)
     91/9    0.001    0.000    0.412    0.046 dimension.py:677(map)
      912    0.009    0.000    0.411    0.000 dimension.py:849(__init__)
       16    0.000    0.000    0.405    0.025 resampling.py:277(resample_2d)
       16    0.000    0.000    0.403    0.025 resampling.py:482(_resample_2d)
        1    0.000    0.000    0.402    0.402 dispatcher.py:337(_compile_for_args)
     23/1    0.000    0.000    0.402    0.402 compiler_lock.py:29(_acquire_compile_lock)

Theoretically itā€™s still possible that Iā€™ve made three separate mistakes in the way Iā€™m using in each of the three different hd functions (regrid here, datashade and aggregate in the waterpoint example), but that now seems a bit unlikely and to me the issue being in either holoviews.operation.datashader or datashader itself just became a more reasonable idea.

Iā€™m really curious what you think, @philippjfr

2 Likes

So we can keep optimizing internally but at a quick glance I donā€™t think you made any mistakes here. I reckon with some more effort we can squeeze out another second or two out of HoloViews and Param, and 1 second is spent on initializing matplotlib colormaps which only occurs on initial render and not subsequently. Beyond that I donā€™t think thereā€™s much we can do, so at best weā€™re going to get this down 2-3 seconds.

Well, I still think something is funky in holoviews.operation.datashader.

Consider this example:

def get_image():
    return hv.Image(xr.DataArray(np.zeros((2, 2))))

combined_dynmap = (
    hv.element.tiles.OSM()
    * hv.DynamicMap(get_image) * hv.DynamicMap(get_image)
    * hv.DynamicMap(get_image) * hv.DynamicMap(get_image)
    * hv.DynamicMap(get_image) * hv.DynamicMap(get_image)
    * hv.DynamicMap(get_image) * hv.DynamicMap(get_image)
)

# NEW CELL:
%%prun -l 100
display(combined_dynmap)

Nothing weird here, it only displays 8 times a 2x2 image filled with zeros. On my laptop, after resetting the kernel, display(combined_dynmap) runs for 1.8s. Itā€™s still unclear for me what the actual work is here that takes almost 2 seconds in processing, but I plan on reading through the package source code and figuring that out, for now letā€™s just accept that as a baseline. If I rerun the display(combined_dynmap) cell a few times without resetting the kernel, I get these runtimes: 0.87s, 0.82s, 0.86s, 0.83s etc, it stays fairly stable from this point. (This is not the point now, but in a live environment every time a user visits the website the whole application reruns, which means that there is a 1s long part that runs every time even though its output supposedly never changes, so that definitely should be cacheable in memory whatever that is, but I will investigate this later.)

Now letā€™s see the exact same example just with hd.regrid:

combined_regrid = (
    hv.element.tiles.OSM()
    * hd.regrid(hv.DynamicMap(get_image)) * hd.regrid(hv.DynamicMap(get_image))
    * hd.regrid(hv.DynamicMap(get_image)) * hd.regrid(hv.DynamicMap(get_image))
    * hd.regrid(hv.DynamicMap(get_image)) * hd.regrid(hv.DynamicMap(get_image))
    * hd.regrid(hv.DynamicMap(get_image)) * hd.regrid(hv.DynamicMap(get_image))
)

# NEW CELL:
%%prun -l 100
display(combined_regrid)

The same 2x2 image, 8 times, thatā€™s a total of 32 zeros being processed. Supposedly, this shouldnā€™t differ much, as upsampling is disabled by default, the data is already in xarray format, so thereā€™s not much to do for regrid. And yet, after resetting the kernel, display(combined_regrid) runs for 3.8s, so it does something for an extra 2s. And whatā€™s worse, that when I rerun the display(combined_regrid) cell a few times without resetting the kernel, I got these runtimes: 4.92s, 5.9s, 7.4s, 9.15s, 10.84s, 12.2s, 14.27s, 16.78s, and then I got bored, but it looks like the runtime keeps growing and growing and that doesnā€™t look normal to me.

The other thing is these runtimes. I come from a PHP background, eg wrote most of the code of a wordpress-like website management system. My former experience is that my system has to deliver everything under one second, including the interpreter parsing the scripts, connecting to database(s), reading the database(s), opening and reading template/data files from the filesystem, processing all of these based on the received user input and at least starting to deliver it. All of this of course concurrently for many users, on potato servers, while PHP not being a super optimized language and Laravel and similar frameworks only making things worse. So you can understand my predicament with having a script now that spends 7-10s in displaying stuff after I already loaded all the data. Thatā€™s just not going to be acceptable for the visitors, and it doesnā€™t matter much if I do the heavy lifting upfront or only onload. Especially not, if displaying basically nothing (2x2 images) takes a lot of time too, so there is not much to gain by preloading lowres images and replacing them with higres alternatives on the fly or something like that. And maybe my expectations were also too high after seeing the Datashader website stating that with Datashader ā€œmillions of taxi tripsā€ ā€œappear in milliseconds without trial and errorā€. Now I timed it and the second cell runs in 1.224s, displaying the result of the thirds cell runs in 1.5s. And while itā€™s technically true these can be both written in milliseconds, itā€™s a bit misleading. Truth to be told, itā€™s pretty impressive that even though the actual example uses 50k records, it runs only 20% longer for 2M points. Based on everything Iā€™ve seen so far, I have the sense that the internal datashader operations indeed run very fast, however there seems to be a huge overhead before that could happen, some stuff that take up about a second before each operation. Anyways, I donā€™t want to complain, and Iā€™m grateful for all your help, I just needed to went after this pretty long week with the project. I will fork everything and experiment with the source code to understand whatā€™s actually happening, then Iā€™ll see if I find a way to optimize some of it for this specific usecaseā€¦

1 Like

On my laptop, after resetting the kernel, display(combined_dynmap) runs for 1.8s

Datashader uses numba under the hood so the first time you render something numbaā€™s JIT compiles the functions, so subsequent renders will be a lot faster.

display(combined_regrid)

As I indicated above, I do not think this is a good test case. display creates a new plot each time and since it does not destroy the old plots you are now accumulating plots which are all being updated every time you rerender. I might be wrong about this but itā€™s the simplest explanation for me.

So you can understand my predicament with having a script now that spends 7-10s in displaying stuff after I already loaded all the data

I appreciate that this isnā€™t very performant and we can work on that but I also donā€™t think itā€™s actually this slow in practice. Youā€™re counting a number of things which arenā€™t actually a problem once the server is running and warmed up:

  • Matplotlib takes about 1 second to initialize the cmap module
  • Numbaā€™s JIT compiler probably spend 2-3 seconds on compiling the datashader codepaths
  • prun has significant overhead slowing things down by at least 1 second

So now weā€™re down to 2-5 seconds to initial render once the server is warmed up and I think with some more optimization of HoloViews and Param we can further reduce that by about 1 second. So yes, thatā€™s nowhere near the 1 second you might expect from some HTTP server but this also isnā€™t a simple HTTP server but a system that runs quite expensive computations dynamically and serves the results of that up to the user.

Anyways, I donā€™t want to complain, and Iā€™m grateful for all your help, I just needed to went after this pretty long week with the project.

Iā€™m actually very grateful for your detailed posts, profiling and responses here, itā€™s a kick in the butt that the project(s) needed so I can at least optimize some of the low hanging fruit.

1 Like

Hi @SteveAKopias

When and if you get to the FastListTemplate, the FastGridTemplate or the FastGalleryTemplate I would very much like a kick in the but and help improve the performance of these?

I just need some help to understand where the bottlenecks are and how they can be solved. :+1:

2 Likes