Speeding up visualization of GOES with Zarr?

@ahuang11 , I was intrigued by your X post where you show that GOES data loads faster after converted to Zarr because of the chunking.

After downloading the data locally:

wget https://cdn.star.nesdis.noaa.gov/GOES18/ABI/FD/GEOCOLOR/GOES18-ABI-FD-GEOCOLOR-10848x10848.tif

I tried running your code on the ESIP JuptyerHub and I didn’t see this speedup:

Seems like a nice simple use case to help understand better how datashader works and why my results were different than yours!

I thought datashader loaded the entire image first, then renders and delivers just the pixels to fill the window. Is that correct?

I may have left out shared_axes=False. Without that the first plot(left) needs to run first blocking the right plot zarr.

You can also try swapping location of tif and zarr e.g. have zarr plot first

Ah, I didn’t realize the first plot was blocking! Indeed, swapping to have zarr plot first shows the speedup as you indicated:

1 Like

I do wonder if it’s technically possible to run update plots simultaneously.

1 Like

Strangely enough I don’t see any difference whatsoever. Both are fast. What versions of the libraries are you using?

$ pip freeze | ag '(holoviews|geoviews|zarr|netcdf|xarray|rasterio|datashader)' | sort

Thanks for sharing! Are you using the same dataset / same code?

Yeap. I think that the actual tif file changes every day or so, so we probably did run with different input files, but yes the code I run is the one you have on github, i.e. the one with shared_axes=False.

TBH, I was surprised by how big the difference was in your video. I have measured some difference between netcdfs and zarr (in the range of 100ms vs 200ms), but If memory serves it mostly had to do with different chunking scheme and/or compression algorithms. The question I was trying to answer at the time was whether it was worth it to rechunk and/or convert from netcdf to zarr. Arguably, tif is a different file format, but these files are quite small.