Choropleth with polygons slow due to shapely coords.xy

dwr-psandhu · May 28, 2020, 11:12pm

I am using holoviews/geoviews to plot polygons.

   dflines=gpd.read_file('flowline-segments.shp').to_crs(epsg=3857)
   view=hv.Polygons(data=dflines,vdims=['data']).opts(line_alpha=0,cmap=cc.rainbow)
   display(view)

The third line above takes 1.15s to display just a few hundred polygons.

I ran just the last line under a profiler and found that 80% of time takes place in the shapely coords.xy call.
This issue is known in shapely with no workaround.

if i want to avoid having to call this, can i precompute from the geometry into a dataframe columns for Polygons so that holoviews/geoviews/geopandas does not call this on the fly?

github.com

takluyver/shapely/blob/master/shapely/coords.py#L156


        'version': 3,
        'typestr': typestr,
        'data': self.ctypes,
        }
    ai.update({'shape': (len(self), self._ndim)})
    return ai


__array_interface__ = property(array_interface)


@property
def xy(self):
    """X and Y arrays"""
    self._update()
    m = self.__len__()
    x = array('d')
    y = array('d')
    temp = c_double()
    for i in xrange(m):
        lgeos.GEOSCoordSeq_getX(self._cseq, i, byref(temp))
        x.append(temp.value)
        lgeos.GEOSCoordSeq_getY(self._cseq, i, byref(temp))

Snippet of profile below.

2501632 function calls (2482131 primitive calls) in 1.567 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 2071    0.811    0.000    0.970    0.000 coords.py:143(xy)
   752904    0.087    0.000    0.087    0.000 {method 'append' of 'array.array' objects}
   766866    0.063    0.000    0.063    0.000 {built-in method _ctypes.byref}
   11    0.029    0.003    0.030    0.003 encoder.py:204(iterencode)
   126198    0.020    0.000    0.031    0.000 {built-in method builtins.isinstance}
 6204    0.019    0.000    0.049    0.000 polygon.py:246(exterior)
11380    0.016    0.000    0.023    0.000 predicates.py:23(__call__)
 1034    0.015    0.000    0.037    0.000 coords.py:164(__call__)
12070    0.012    0.000    0.012    0.000 {built-in method numpy.array}
11380    0.012    0.000    0.038    0.000 base.py:640(is_empty)
27782    0.011    0.000    0.014    0.000 parameterized.py:768(__get__)
57690/51046    0.010    0.000    0.049    0.000 {built-in method builtins.getattr}
 1046    0.008    0.000    0.125    0.000 series.py:183(__init__)
 1037    0.008    0.000    1.045    0.001 util.py:316(geom_to_array)
 1437    0.007    0.000    0.013    0.000 parameterized.py:790(__set__)
 7244    0.007    0.000    0.007    0.000 base.py:67(geometry_type_name)
 1042    0.007    0.000    0.009    0.000 {pandas._libs.lib.infer_dtype}
 5176    0.007    0.000    0.007    0.000 coords.py:44(_update)
19336/19332    0.007    0.000    0.034    0.000 {built-in method builtins.hasattr}
24490    0.006    0.000    0.009    0.000 generic.py:10(_check)
3889/2271    0.006    0.000    0.018    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
  365    0.006    0.000    0.028    0.000 parameterized.py:1166(_setup_params)
 1034    0.006    0.000    0.010    0.000 linestring.py:78(ctypes)
56386    0.006    0.000    0.006    0.000 base.py:253(_geom)
 1041    0.005    0.000    0.005    0.000 {built-in method binascii.b2a_base64}
1    0.005    0.005    0.005    0.005 decoder.py:343(raw_decode)
 2588    0.005    0.000    0.009    0.000 coords.py:48(__len__)
3    0.005    0.002    1.290    0.430 geopandas.py:348(values)I

philippjfr · May 30, 2020, 2:36pm

Yes, you could do that, the easiest way to do this is:

view = hv.Polygons(data=dflines,vdims=['data']).clone(datatype=['multitabular'])

This will convert the geopandas data into HoloViews’ internal dictionary format. Alternatively you could give spatialpandas a try:

import spatialpandas as spd
dflines = spd.GeoDataFrame(gpd.read_file('flowline-segments.shp').to_crs(epsg=3857))

dwr-psandhu · May 30, 2020, 4:07pm

@philippjfr Once I have the view displayed, I would like to update the ‘data’ dimension values and update the choropleth map displayed with those new data values. Currently, I create a view everytime as you have above and display it.

Is there a more efficient way to just update the colors of the polygons without having to redraw them everytime?

dwr-psandhu · May 31, 2020, 3:50pm

@philippjfr I hacked the performance for the ‘bokeh’ renderer by obtaining the handle to the rendered view

rendered_view=hv.render(view)
renderer=rendered_view.renderers[0]
target = show(rendered_view, notebook_handle=True)
# update renderer source data with the new values 
# (xs and ys have not changed) i.e. polygon shapes
renderer.data_source.data['data']=dflines['data'].values
push_notebook(handle=target)

The above code renders the update view more than an order of magnitude faster 50ms vs 900ms

Is there a way to do this in holoviews ? or do I have to let this be renderer specific ?

Also this is a usecase that could help with datashade renderers performance as well, esp for meshes that are static, i.e. the points/polygons not changing, only color changes due to data mapping.

philippjfr · May 31, 2020, 4:13pm

The above code renders the update view more than an order of magnitude faster 50ms vs 900ms

If you’re modifying the same underlying datastructure HoloViews should optimize this automatically. Basically the optimization works by checking whether the elements .data has the same id, e.g. in this example you’ll see that updates are pretty quick but if you were to make a copy of poly_data it’ll be about an order of magnitude slower:

poly_data = gpd.read_file(gpd.datasets.get_path('nybb')) = 
slider = pn.widgets.FloatSlider()

@pn.depends(slider)
def change_poly_data(value):
    poly_data['color_column'] = np.random.randn(5)*value
    polys = hv.Polygons(poly_data, vdims=['color_column'])
    return polys.opts(color='color_column')

pn.Column(
    slider,
    hv.DynamicMap(change_poly_data)
)

That said we could perhaps consider some mechanism by which you can explicitly declare which columns in the dataframe to update.

This seems very unlikely to me, datashader will still have to iterate over the entire data to aggregate the values so I don’t see much scope to optimize here. Where there is scope for that kind of optimization is trimesh rendering because the datastructure is converted to one that datashader can iterate over easily, which is quite expensive.

dwr-psandhu · June 1, 2020, 12:36am

@philippjfr i created a cell with your code above and then created a cell to run the slider through a 100 values in a loop

for i in range(100): slider.value=i/100.

This took about 8.5 seconds

I then modified your example so and the ran the same cell above and that took 259 millis.

poly_data = gpd.read_file(gpd.datasets.get_path('nybb'))
slider = pn.widgets.FloatSlider()

from bokeh.io import push_notebook, show, output_notebook
def create_view(value):
    poly_data['color_column'] = np.random.randn(len(poly_data))*value
    polys = hv.Polygons(poly_data, vdims=['color_column'])
    return polys.opts(color='color_column')
# seed the view
view=create_view(0)
rendered_view=hv.render(view)
pr=rendered_view.renderers[0]
target = show(rendered_view, notebook_handle=True)

#@pn.depends(slider)
def change_view_data(value):
    poly_data['color_column'] = np.random.randn(len(poly_data))*value
    pr.data_source.data['color']=poly_data['color_column'].values
    push_notebook(handle=target)

pn.interact(change_view_data,value=slider)

philippjfr · June 1, 2020, 12:08pm

Huh, yes you’re right. When I tried it the first time it was significantly faster, but if I try it now it’s definitely slower. Will have to look at that properly.

dwr-psandhu · June 2, 2020, 12:28am

@philippjfr would you like me to file an issue or perhaps you have already done so ?

philippjfr · June 2, 2020, 12:44am

An issue would be great.

dwr-psandhu · June 2, 2020, 8:00pm