I am quite new to Holoviz but “seasoned” on dask/xarray/numpy.
I’m trying to visualize a large GeoTiff/Zarr geographical dataset using
hvplot, but I’m having trouble to understand how to control the dask side of things.
data = rioxarray.open_rasterio(path, chunks='auto') test_data = data.isel(x=range(100_000), y=range(100_000)).persist() # note that chunking is untouched, here being (272, 100_000)
I noticed that if I call the following
img = test_data.hvplot.image( x='x', y='y', crs=data.rio.crs.to_proj4(), rasterize=True, width=700, height=500, cmap='magma', tiles=True, aggregator='mode') img
I get the cluster doing some initial work that includes two
finalize_from_value tasks (half the data each), then performs some extra work as I interact with the plot
However, if I try to rechunk to something that I believe should make more sense for this visualization (lets say
dict(x=10_000, y=10_000)) I get the two
finalize_from_value tasks become a single one and failing due to memory limit on the single worker.
Now, that is quite the opposite of what I was expecting and I was wondering how you would control the partitioning of that part of the workload (I guess is the rasterization?)