Hi!
I am quite new to Holoviz but “seasoned” on dask/xarray/numpy.
I’m trying to visualize a large GeoTiff/Zarr geographical dataset using hvplot
, but I’m having trouble to understand how to control the dask side of things.
Given that:
data = rioxarray.open_rasterio(path, chunks='auto')
test_data = data.isel(x=range(100_000), y=range(100_000)).persist()
# note that chunking is untouched, here being (272, 100_000)
I noticed that if I call the following
img = test_data.hvplot.image(
x='x',
y='y',
crs=data.rio.crs.to_proj4(),
rasterize=True,
width=700,
height=500,
cmap='magma',
tiles=True,
aggregator='mode')
img
I get the cluster doing some initial work that includes two finalize_from_value
tasks (half the data each), then performs some extra work as I interact with the plot
However, if I try to rechunk to something that I believe should make more sense for this visualization (lets say dict(x=10_000, y=10_000)
) I get the two finalize_from_value
tasks become a single one and failing due to memory limit on the single worker.
Now, that is quite the opposite of what I was expecting and I was wondering how you would control the partitioning of that part of the workload (I guess is the rasterization?)