@ahuang11 , I was intrigued by your X post where you show that GOES data loads faster after converted to Zarr because of the chunking.
After downloading the data locally:
I tried running your code on the ESIP JuptyerHub and I didn’t see this speedup:
Seems like a nice simple use case to help understand better how datashader works and why my results were different than yours!
I thought datashader loaded the entire image first, then renders and delivers just the pixels to fill the window. Is that correct?
I may have left out shared_axes=False. Without that the first plot(left) needs to run first blocking the right plot zarr.
You can also try swapping location of tif and zarr e.g. have zarr plot first
Ah, I didn’t realize the first plot was blocking! Indeed, swapping to have zarr plot first shows the speedup as you indicated:
I do wonder if it’s technically possible to run update plots simultaneously.
Strangely enough I don’t see any difference whatsoever. Both are fast. What versions of the libraries are you using?
$ pip freeze | ag '(holoviews|geoviews|zarr|netcdf|xarray|rasterio|datashader)' | sort
Thanks for sharing! Are you using the same dataset / same code?
Yeap. I think that the actual tif file changes every day or so, so we probably did run with different input files, but yes the code I run is the one you have on github, i.e. the one with
TBH, I was surprised by how big the difference was in your video. I have measured some difference between netcdfs and zarr (in the range of 100ms vs 200ms), but If memory serves it mostly had to do with different chunking scheme and/or compression algorithms. The question I was trying to answer at the time was whether it was worth it to rechunk and/or convert from netcdf to zarr. Arguably, tif is a different file format, but these files are quite small.