Plotting raster timeseries for dashboard

ciskoh · October 26, 2022, 12:14pm

hi all,
new user here. I am trying to build a geo-dashboard using datashader especially for grid plotting. My first problem is that I need to plot a raster time series (geotiffs loaded with rioxarrayi into a xarray dataset of dims [time :100,x: 5000,y:7000]) as a sequence of rasters. Eventually a slider will allow the user to scroll or select a specific timestep to visualize. While I can get up to aggregation fine with very fast code, the shading /colourmapping takes around 7 seconds per raster, which is way too much for my needs. So I am looking for help on how to structure this.

Should I pre-compute all the plots for each timestep?

Is there any other way to parallelize the shading step taking advantage of xarray /dask features ? Do you have any examples of plotting (geo) raster time series I could look at?

ianthomas23 · October 26, 2022, 2:59pm

7 seconds is not an untypical amount of time to shade an xarray.DataArray of shape (7000, 5000). It takes about 5 seconds on my dev machine using a really simple colormap.

The only parallelisation that you can currently use in shade is to run it on a CUDA-based GPU. If you pass shade a cupy-backed DataArray then this will happen automatically. If I have a numpy-backed DataArray then I switch this to cupy-backed before passing it to shade as follows:

import cupy
agg.data = cupy.asarray(agg.data)
im = tf.shade(agg, ...)

and on my hardware it only takes 0.6 seconds.

shade could be modified to use dask for CPU-parallelisation but there has been no work started along these lines as far as I am aware.

ciskoh · October 26, 2022, 4:25pm

Thank you for the clarification. This was helpful. So it seems my only option is to precompute (agg+shading) all the timesteps and then select what to display. Now the problem is how to compute it in the backend and pass it to frontend. Is there any suggested way to store objects such as aggregations (I guess they are arrays, right?) or plots (i.e after shading) ?

jbednar · October 26, 2022, 7:00pm

Do you really have an monitor with 5000x7000 resolution (8K!)? I’m jealous. If your monitor isn’t that large, then the resolution of what gets shaded should be less, and thus faster than 7 seconds; on my Mac Retina display it’s normally well under a second, and so we haven’t heavily optimized that step.

Assuming the displayed resolution really is that large, converting it to cupy as Ian suggests should be a good speedup, though it would require installing the CUDA drivers and cupy.

If you’re embedding the datashader plots in HoloVIews or hvPlot, you can also consider using rasterize instead of datashade, which will use Bokeh to do the colormapping in JS instead. I have no idea which of those two approaches is faster, but you can use either one.

In any case, it should be easy to precompute the RGBs, for a fixed output resolution. Using Datashader’s API, just save the output of the shade() command, which is an xarray DataArray that has methods for rendering a PNG or displaying itself directly in Jupyter. Using hvPlot, you can capture the result of .hvplot(..., datashade=True, dynamic=False), which should have the rendering already done. Using HoloViews, it would be datashade(..., dynamic=False).

Or you could look at the shade() code and see if you can speed it up or add Dask support for it. Definitely worth someone doing at some point!

ciskoh · October 27, 2022, 9:29am

Thanks jbednar for the explanation. Your sentence

Do you really have an monitor with 5000x7000 resolution (8K!)? I’m jealous. If your monitor isn’t that large, then the resolution of what gets shaded should be less

I actually wrote it wrong in the OP my geotiff resolution is ~9000 X 17000.
The comment above seems to point to the fact that I am creating my canvas wrong. At the moment I am creating the canvas as follows:

import datashader as dsh
# ds_height is my array dataset
rast = ds_height[9,:,:] # this takes timestep 9 2d array from the xarray dataset
canvas = dsh.Canvas(rast.shape[1], rast.shape[0])

So the canvas extent is equivalent to 2d array dims. Should I reduce the canvas size to a more reasonable size of what I want to display?

ianthomas23 · October 28, 2022, 3:02pm

If you are using Datashader to create an image to display, then there is no point in creating the image at a higher resolution than you are displaying at. It is unnecessary calculation that is going to be sub-sampled for display.

The shade function is approximately linear in the number of pixels it is calculating. So to calculate a 1000x1000 pixel image takes approximately 1/153 of the time it takes to calculate a 9000x17000 pixel image. So that takes your 7 second calculation down to 0.05 seconds.

ciskoh · November 3, 2022, 11:17am

SOLVED
Thanks all for the support. I managed to speed up a lot several step with following steps:
1- Set the canvas size to something close to the output image resolution (preserving as much as possible the original ratio)
2- Set chunks to auto while importing data with xarray