So far I managed to isolate the parts that load the images (tiffs to be precise) and the waterpoints.
As a recap, this how combined
looks like:
combined = (
# PART 1
hv.DynamicMap(get_tiles)
# PART 2
* hd.regrid(hv.DynamicMap(get_image_01)).apply.opts(cmap=pn_01_cmap, alpha=pn_01_alpha)
* hd.regrid(hv.DynamicMap(get_image_02)).apply.opts(cmap=pn_02_cmap, alpha=pn_02_alpha)
* hd.regrid(hv.DynamicMap(get_image_03)).apply.opts(cmap=pn_03_cmap, alpha=pn_03_alpha)
* hd.regrid(hv.DynamicMap(get_image_04)).apply.opts(cmap=pn_04_cmap, alpha=pn_04_alpha)
* hd.regrid(hv.DynamicMap(get_image_05)).apply.opts(cmap=pn_05_cmap, alpha=pn_05_alpha)
* hd.regrid(hv.DynamicMap(get_image_06)).apply.opts(cmap=pn_06_cmap, alpha=pn_06_alpha)
* hd.regrid(hv.DynamicMap(get_image_07)).apply.opts(cmap=pn_07_cmap, alpha=pn_07_alpha)
* hd.regrid(hv.DynamicMap(get_image_08)).apply.opts(cmap=pn_08_cmap, alpha=pn_08_alpha)
# PART 3
* hd.regrid(hv.DynamicMap(get_white_bg_for_points)).options(cmap=['white']).apply.opts(alpha=pn_white_bg_for_points_alpha)
* points_aggregated_categorical
* points_aggregated_for_hover
* hv_points
# PART 4
* hd.regrid(hv.DynamicMap(get_image_for_hover_tooltip))
)
- Part 1: Just returns a getattr(hv.element.tiles, pn_tile)() based on a Select.
- Part 2: Depending on the selectable layers checkboxlist it either actually loads and returns the actual image from disk based on other widgets, or returns an empty image as a placeholder.
- Part3: Has a basic placeholder as background, a categorical hd.datashade, a default hd.aggregate and a hv.Points based on the same DataFrame loaded from a CSV.
- Part4: Just a placeholder for displaying custom tooltips.
So there are only two possible type of datasources: the images are loaded from the tiffs and the points are loaded from a CSV.
So I added a few variables to control these:
# If dummy_image_scale==True, it gives back generated dummy images instead of
# loading the tiffs.
# dummy_image_scale defines the size of the image, 100(%) meaning 100x250
# hv.Image(np.ones((dummy_image_scale, int(dummy_image_scale *2.5))),
# bounds=(left,bottom,right,top))
load_dummy_images = True
dummy_image_scale = 10
# If load_dummy_wp==True, it generates a waterpoints DataFrame with the same
# columns but with filler data instead of loading the csv
load_dummy_wp = True
# If wp_dummy_limit>0, it limits the length of the waterpoints DataFrame (real or
# dummy)
# If wp_dummy_limit==0, it completely disables anything related to waterpoints,
# including loading/generating data, manipulating it, generating aggregates
# and even removes the whole PART3 from combined
wp_dummy_limit = 130000 # default: 130000, the size of the original CSV
This means that I can now completely disable 1) touching the files 2) dealing with large datasets. I started with gradual changes, and again, restarting the kernel every time.
The code I ran:
%%prun -l 100
display(pn.Row(content, widgets))
(content
are widgets
are the pn containers, the former containing combined
, the latter containing all the widgets, of course.)
And the results with the relevant settings (bold if not default):
Original settings
{āload_dummy_imagesā: False, ādummy_image_scaleā: 100,
āload_dummy_wpā: False, āwp_dummy_limitā: 130000}
19309648 function calls (18620261 primitive calls) in 13.693 seconds
Dummy Images, real WP
{āload_dummy_imagesā: True, ādummy_image_scaleā: 100,
āload_dummy_wpā: False, āwp_dummy_limitā: 130000}
15689963 function calls (15303463 primitive calls) in 11.841 seconds
Dummy Images & Dummy WP
{āload_dummy_imagesā: True, ādummy_image_scaleā: 100,
āload_dummy_wpā: True, āwp_dummy_limitā: 130000}
15691597 function calls (15305097 primitive calls) in 11.399 seconds
Dummy Images & Dummy WP, Dummy Image Scale: 10%
{āload_dummy_imagesā: True, ādummy_image_scaleā: 10,
āload_dummy_wpā: True, āwp_dummy_limitā: 130000}
15728239 function calls (15341594 primitive calls) in 12.093 seconds
Dummy both, Image Scale: 10%, WP Limit: 130
{āload_dummy_imagesā: True, ādummy_image_scaleā: 10,
āload_dummy_wpā: True, āwp_dummy_limitā: 130}
16473182 function calls (16176164 primitive calls) in 11.786 seconds
Dummy Images, Image Scale: 10%, Waterpoints disabled
{āload_dummy_imagesā: True, ādummy_image_scaleā: 10,
āload_dummy_wpā: True, āwp_dummy_limitā: 0}
7451420 function calls (7295576 primitive calls) in 5.589 seconds
Iāve ran the same tests also with hv.render(combined)
, that produced similar changes, but this time ranging from 31s (original settings) to 12s (dummy images downscaled, waterpoints disabled).
What I have learned from this:
- Disabling all the real image loading helped only about 7%. Thatās good, because I was worried that if loading big data files causes the issue, there wouldnāt be any solution. Decreasing the dummy image size didnāt improve the results, although increasing (not logged) worsened it, so that 7% is most probably due to the actual resolution difference between the real and dummy images.
- Replacing the waterpoints DataFrame with a faux version, of course, did not result in any change, I wasnāt expecting it either.
- Whatās surprising, that reducing the number of records in the waterpoints DataFrame to 0.1% (or even when reducing them to 1 single record) didnāt change anything either. Which is, again, good news, as itās not the size of the dataset that is the problem, but this was unexpected.
- Completely eliminating everything thatās related to the waterpoint DataFrame results in an about 60% time reduction, so there is definitely something going on there. However thereās still a significant amount of time remaining.
Also, @philippjfr: what makes you say this is a numba issue? This is what Iāve got for the last two tests (everthing reduced to minimal and waterpoints disabled) above:
Dummy both, Image Scale: 10%, WP Limit: 130
Ordered by: internal time
List reduced from 4748 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
940822 0.423 0.000 0.556 0.000 parameterized.py:837(__get__)
33054 0.357 0.000 1.814 0.000 parameterized.py:1277(_setup_params)
123543 0.347 0.000 0.583 0.000 parameterized.py:859(__set__)
1644391/1642463 0.342 0.000 0.427 0.000 {built-in method builtins.isinstance}
363350 0.259 0.000 0.355 0.000 parameterized.py:2544(param)
99332/66278 0.255 0.000 2.559 0.000 parameterized.py:1060(override_initialization)
1564530 0.236 0.000 0.236 0.000 {method 'get' of 'dict' objects}
33054 0.229 0.000 2.973 0.000 parameterized.py:2506(__init__)
813332/696487 0.220 0.000 1.621 0.000 {built-in method builtins.getattr}
67396 0.159 0.000 0.774 0.000 bases.py:328(prepare_value)
124347/123543 0.147 0.000 0.773 0.000 parameterized.py:313(_f)
20523 0.139 0.000 0.392 0.000 util.py:874(isfinite)
65311/55738 0.138 0.000 0.291 0.000 copy.py:128(deepcopy)
87988 0.132 0.000 0.259 0.000 parameterized.py:2278(get_param_descriptor)
207750 0.131 0.000 0.346 0.000 dimension.py:302(spec)
27590 0.116 0.000 0.116 0.000 {method 'reduce' of 'numpy.ufunc' objects}
157603 0.112 0.000 0.143 0.000 parameterized.py:160(classlist)
16645 0.111 0.000 2.044 0.000 options.py:466(__init__)
16455 0.108 0.000 0.277 0.000 dimension.py:606(matches)
540569/540479 0.107 0.000 0.120 0.000 {built-in method builtins.hasattr}
469554 0.104 0.000 0.105 0.000 {built-in method builtins.issubclass}
55738 0.102 0.000 0.411 0.000 parameterized.py:1346(_instantiate_param)
96114/81924 0.098 0.000 0.164 0.000 model.py:824(_visit_value_and_its_immediate_references)
363350 0.097 0.000 0.097 0.000 parameterized.py:1142(__init__)
60555 0.095 0.000 0.125 0.000 parameterized.py:1480(objects)
40435 0.095 0.000 0.279 0.000 parameterized.py:1745(get_value_generator)
106603 0.095 0.000 0.468 0.000 dimension.py:359(__eq__)
63680/62610 0.091 0.000 1.158 0.000 bases.py:182(themed_default)
33054 0.087 0.000 0.493 0.000 parameterized.py:1271(_generate_name)
62924/61854 0.085 0.000 1.359 0.000 descriptors.py:704(_get_default)
42713/40745 0.083 0.000 0.221 0.000 tree.py:216(__setattr__)
34497 0.081 0.000 0.136 0.000 container.py:178(validate)
10179 0.080 0.000 0.447 0.000 parameterized.py:1694(get_param_values)
212929/212850 0.078 0.000 0.920 0.000 {built-in method builtins.setattr}
95351/94281 0.078 0.000 1.454 0.000 descriptors.py:676(_get)
86578 0.077 0.000 0.110 0.000 util.py:374(tree_attribute)
74609 0.076 0.000 0.174 0.000 util.py:737(__call__)
13691 0.068 0.000 1.615 0.000 model.py:808(_visit_immediate_value_references)
64833 0.068 0.000 0.205 0.000 copy.py:66(copy)
61361/60129 0.067 0.000 0.495 0.000 {built-in method builtins.any}
7764 0.066 0.000 0.545 0.000 util.py:965(max_range)
19883/18013 0.065 0.000 0.091 0.000 {built-in method numpy.array}
11012/2753 0.063 0.000 1.122 0.000 options.py:772(options)
19271 0.063 0.000 0.120 0.000 options.py:745(<genexpr>)
71682/71551 0.062 0.000 0.262 0.000 {built-in method builtins.sorted}
62924/61854 0.061 0.000 1.236 0.000 descriptors.py:584(instance_default)
318373 0.060 0.000 0.060 0.000 {method 'endswith' of 'str' objects}
14750 0.055 0.000 0.093 0.000 functools.py:35(update_wrapper)
103428/101364 0.053 0.000 1.455 0.000 descriptors.py:464(__get__)
63680/62610 0.049 0.000 0.337 0.000 bases.py:161(_copy_default)
283289 0.047 0.000 0.047 0.000 {method 'items' of 'dict' objects}
32230 0.046 0.000 0.055 0.000 copy.py:242(_keep_alive)
71865 0.045 0.000 0.045 0.000 has_props.py:228(accumulate_dict_from_superclasses)
219785/198812 0.045 0.000 0.059 0.000 {built-in method builtins.len}
129963 0.043 0.000 0.063 0.000 has_props.py:664(themed_values)
114840 0.042 0.000 0.042 0.000 util.py:690(<genexpr>)
47314 0.042 0.000 0.042 0.000 wrappers.py:138(__init__)
3760 0.041 0.000 0.082 0.000 dataset.py:1300(_construct_dataarray)
80 0.041 0.001 2.659 0.033 plot.py:717(_compute_group_range)
Dummy Images, Image Scale: 10%, Waterpoints disabled
7451420 function calls (7295576 primitive calls) in 5.589 seconds
Ordered by: internal time
List reduced from 4039 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
17006 0.194 0.000 0.955 0.000 parameterized.py:1277(_setup_params)
62718 0.188 0.000 0.309 0.000 parameterized.py:859(__set__)
699794/698666 0.142 0.000 0.165 0.000 {built-in method builtins.isinstance}
262817 0.141 0.000 0.179 0.000 parameterized.py:837(__get__)
181577 0.139 0.000 0.194 0.000 parameterized.py:2544(param)
17006 0.124 0.000 1.515 0.000 parameterized.py:2506(__init__)
364326/303210 0.108 0.000 0.848 0.000 {built-in method builtins.getattr}
37325 0.091 0.000 0.449 0.000 bases.py:328(prepare_value)
576076 0.090 0.000 0.091 0.000 {method 'get' of 'dict' objects}
63129/62718 0.078 0.000 0.410 0.000 parameterized.py:313(_f)
44381 0.071 0.000 0.137 0.000 parameterized.py:2278(get_param_descriptor)
30948/26699 0.069 0.000 0.144 0.000 copy.py:128(deepcopy)
51120/34114 0.069 0.000 1.289 0.000 parameterized.py:1060(override_initialization)
8797 0.061 0.000 1.090 0.000 options.py:466(__init__)
80197 0.061 0.000 0.079 0.000 parameterized.py:160(classlist)
181578 0.055 0.000 0.055 0.000 parameterized.py:1142(__init__)
48314/42498 0.054 0.000 0.091 0.000 model.py:824(_visit_value_and_its_immediate_references)
283205/283134 0.054 0.000 0.061 0.000 {built-in method builtins.hasattr}
7371 0.053 0.000 0.152 0.000 util.py:874(isfinite)
26699 0.053 0.000 0.207 0.000 parameterized.py:1346(_instantiate_param)
223262 0.051 0.000 0.052 0.000 {built-in method builtins.issubclass}
34534/33650 0.050 0.000 0.650 0.000 bases.py:182(themed_default)
34149/33265 0.049 0.000 0.762 0.000 descriptors.py:704(_get_default)
19103 0.048 0.000 0.140 0.000 parameterized.py:1745(get_value_generator)
29078 0.048 0.000 0.063 0.000 parameterized.py:1480(objects)
17006 0.047 0.000 0.267 0.000 parameterized.py:1271(_generate_name)
10543 0.045 0.000 0.045 0.000 {method 'reduce' of 'numpy.ufunc' objects}
6900 0.044 0.000 0.120 0.000 dimension.py:606(matches)
22482/21434 0.044 0.000 0.118 0.000 tree.py:216(__setattr__)
17035 0.044 0.000 0.070 0.000 container.py:178(validate)
52191/51307 0.043 0.000 0.814 0.000 descriptors.py:676(_get)
105391/105279 0.043 0.000 0.578 0.000 {built-in method builtins.setattr}
45548 0.042 0.000 0.059 0.000 util.py:374(tree_attribute)
38420/38289 0.038 0.000 0.180 0.000 {built-in method builtins.sorted}
4576 0.038 0.000 0.221 0.000 parameterized.py:1694(get_param_values)
7600 0.037 0.000 0.865 0.000 model.py:808(_visit_immediate_value_references)
34753 0.037 0.000 0.114 0.000 copy.py:66(copy)
5956/1489 0.036 0.000 0.642 0.000 options.py:772(options)
33029 0.035 0.000 0.082 0.000 util.py:737(__call__)
10423 0.035 0.000 0.066 0.000 options.py:745(<genexpr>)
458 0.035 0.000 0.044 0.000 ffi.py:149(__call__)
177366 0.034 0.000 0.034 0.000 {method 'endswith' of 'str' objects}
34149/33265 0.034 0.000 0.693 0.000 descriptors.py:584(instance_default)
27927/27382 0.031 0.000 0.260 0.000 {built-in method builtins.any}
58209/56505 0.029 0.000 0.801 0.000 descriptors.py:464(__get__)
34534/33650 0.028 0.000 0.217 0.000 bases.py:161(_copy_default)
40613 0.027 0.000 0.028 0.000 has_props.py:228(accumulate_dict_from_superclasses)
19023/18560 0.027 0.000 0.259 0.000 has_props.py:273(__setattr__)
6695 0.026 0.000 0.044 0.000 functools.py:35(update_wrapper)
2244 0.025 0.000 0.050 0.000 dataset.py:1300(_construct_dataarray)
26061 0.025 0.000 0.025 0.000 wrappers.py:138(__init__)
145772 0.024 0.000 0.024 0.000 {method 'items' of 'dict' objects}
71076 0.023 0.000 0.035 0.000 has_props.py:664(themed_values)
14894 0.023 0.000 0.027 0.000 copy.py:242(_keep_alive)
131821 0.022 0.000 0.029 0.000 parameterized.py:1229(__iter__)
9850 0.022 0.000 0.109 0.000 container.py:74(validate)
54740 0.021 0.000 0.021 0.000 util.py:690(<genexpr>)
34083 0.021 0.000 0.025 0.000 parameterized.py:1020(_validate)
2978 0.020 0.000 0.133 0.000 options.py:733(find)
So even when it runs in 5s, when it does not touch any real data, when it only handles really downsized faux data, it still has tens or hundreds of thousands of calls to parameterized.py functions. I donāt use params directly, but I use a lot of panel widgets, so my best guess that maybe they somehow get tangled together and keep triggering each other?
Now that I removed real data as a requirement, I can move towards creating a reduced example, and also will experiment with removing some widgets and functions being dependent on them.