Larger than memory plotting: how to set initial slice to plot

This seems like it should be really straight forward, but I can’t find any examples of it and my own code doesn’t work.

I’m trying to plot what is effectively a giant image (a spectrogram which is larger than what can be loaded into memory). I can bring it in as a dask array and create a DynamicMap of an Image, but when I go to rasterize it hangs. I’m assuming that this is because it’s trying to rasterize the entire dataset, which will take ages, even if it is able to do it without running out of memory. Ideally, I’d set an initial slice for it to plot and then let the user pan/zoom to other regions. I think I could do this with kdims, but I don’t know how to link kdims to the panning/zooming in bokeh plots.

Are there any examples of plotting datasets which don’t fit in memory? Is there a way to set the initial slice which is plotted?

Are there examples of plotting

What if you set the dask chunksize to be smaller?

I tried that and got more or less the same result. I tried using just a subset of the data which can fit in memory, and it took a couple minutes, but eventually plotted. Because of the nature of my data, it will never really make sense to plot more than a pretty small subsection of the data at a time so I’m really just looking for a way to set the initial zoom level/slice of data to be plotted.

Can you share a minimal example? Also, maybe submit an issue to holoviews or datashader github if it still doesn’t work

Here’s a fairly minimal example of what I’m trying to do:

Create demo data

Data creation
import h5py
import numpy as np

path = 'data.h5'
with h5py.File(path, 'w') as f:
    f.create_dataset('data', (0, 2401), maxshape=(None, 2401), dtype='int16', chunks=(60*60,2401))

def create_data():
    nx, ny = 2401, 10
    x = np.linspace(0, 1, nx)
    y = np.linspace(0, 1, ny)
    xx, yy = np.meshgrid(x, y)
return ((2**15-1)*np.sin(xx*2*np.pi*50)*np.cos(yy*2*np.pi/4)).astype('int16')

data = create_data()
# This creates a ~5GB file!
for i in range(100000):
    with h5py.File(path, 'a') as f:
        f['data'].resize(f['data'].shape[0]+data.shape[0], axis=0)
        f['data'][-data.shape[0]:] = data
App
import h5py
import dask.array as da
import holoviews as hv
from holoviews.operation.datashader import rasterize, shade
import panel as pn


hv.extension('bokeh', logo=False)

f = h5py.File('data.h5')
full_data = da.from_array(f['data'], chunks=(60*60, 2401), name=False)

class BigSpectrogram():
    def points(self):
        data = full_data[:100]
        image = hv.Image(hv.Dataset((range(data.shape[1]), range(data.shape[0]), data), ['x', 'y'], 'z'))
        return image

    def view(self,**kwargs):
        points = hv.DynamicMap(self.points)
        agg = rasterize(points, x_sampling=1, y_sampling=1, width=1200, height=800)
        return shade(agg).opts(height=800, responsive=True)

spectrogram = BigSpectrogram()
pn.Row(spectrogram.view()).servable()

I run into another problem with this minimal example however:

ValueError: 'list' argument must have no negative elements

I’ve googled around and have no idea what’s causing it.

To clarify, the primary question isn’t about getting this specific example to work (there are some bugs I can troubleshoot) but about how to set an initial slice to be plotted. Even if I am able to get the above example to work, I still will never want to try to plot all the data at once (the actual dataset is ~10TB), I’ll want to simply plot an initial portion.

Haven’t had time to try it out yet, but how is your data stored? Is it a zarr file or netcdf file?

I think it may also depend on the underlying chunks in the file. Are you able to load a single chunk as a dask.array? If not, then it means it is trying to load the entire file to rechunk, but that’s when it leads to a memory error? (just a guess)

Okay, I think this works

import h5py
import numpy as np
import panel as pn
import xarray as xr
import holoviews as hv
import dask.array as da
from holoviews.operation.datashader import rasterize, shade
hv.extension('bokeh', logo=False)

f = h5py.File('data.h5', "r")
full_data = da.from_array(f['data'], chunks=(60*60, 2401), name=False)
da = xr.DataArray(
    full_data,
    dims=("x", "y"),
    coords={"x": np.arange(full_data.shape[0]), "y": np.arange(full_data.shape[1])},
)

class BigSpectrogram():
    def points(self, x_range, y_range):
        print(x_range, y_range)
        data = da.sel(x=slice(*x_range), y=slice(*y_range)).compute()
        image = hv.Image(data)
        return image

    def view(self,**kwargs):
        range_xy = hv.streams.RangeXY(x_range=(0, 3600), y_range=(0, 2401))
        points = hv.DynamicMap(self.points, streams=[range_xy])
        agg = rasterize(points)
        return agg

spectrogram = BigSpectrogram()
pn.Row(spectrogram.view()).servable()