Speeding up visualization of GOES with Zarr?

rsignell · September 22, 2023, 12:22pm

@ahuang11 , I was intrigued by your X post where you show that GOES data loads faster after converted to Zarr because of the chunking.

After downloading the data locally:

wget https://cdn.star.nesdis.noaa.gov/GOES18/ABI/FD/GEOCOLOR/GOES18-ABI-FD-GEOCOLOR-10848x10848.tif

I tried running your code on the ESIP JuptyerHub and I didn’t see this speedup:
2023-09-22_08-13-41

Seems like a nice simple use case to help understand better how datashader works and why my results were different than yours!

I thought datashader loaded the entire image first, then renders and delivers just the pixels to fill the window. Is that correct?

ahuang11 · September 22, 2023, 4:26pm

I may have left out shared_axes=False. Without that the first plot(left) needs to run first blocking the right plot zarr.

You can also try swapping location of tif and zarr e.g. have zarr plot first

rsignell · September 22, 2023, 6:29pm

Ah, I didn’t realize the first plot was blocking! Indeed, swapping to have zarr plot first shows the speedup as you indicated:
2023-09-22_14-27-54

ahuang11 · September 23, 2023, 12:19am

I do wonder if it’s technically possible to run update plots simultaneously.

pmav99 · September 28, 2023, 3:15pm

Strangely enough I don’t see any difference whatsoever. Both are fast. What versions of the libraries are you using?

$ pip freeze | ag '(holoviews|geoviews|zarr|netcdf|xarray|rasterio|datashader)' | sort
datashader==0.15.2
geoviews==1.10.1
h5netcdf==1.2.0
holoviews==1.17.1
netCDF4==1.6.4
rasterio==1.3.8
rioxarray==0.15.0
xarray==2023.9.0
zarr==2.16.1

ahuang11 · September 30, 2023, 9:27pm

Thanks for sharing! Are you using the same dataset / same code?

pmav99 · September 30, 2023, 10:05pm

Yeap. I think that the actual tif file changes every day or so, so we probably did run with different input files, but yes the code I run is the one you have on github, i.e. the one with shared_axes=False.

TBH, I was surprised by how big the difference was in your video. I have measured some difference between netcdfs and zarr (in the range of 100ms vs 200ms), but If memory serves it mostly had to do with different chunking scheme and/or compression algorithms. The question I was trying to answer at the time was whether it was worth it to rechunk and/or convert from netcdf to zarr. Arguably, tif is a different file format, but these files are quite small.