Datashader Rasterize with DASK

Hi,

I am trying to visualize time series data in parallel using DASK and Holoviews. When I call rasterize, the data is returned from memory to a single node.

import datashader as ds
import holoviews as hv
import xarray as xr

from dask import array as da
from dask.distributed import Client, progress
from dask_jobqueue import PBSCluster

from holoviews import opts

from holoviews.operation.datashader import rasterize
from bokeh.models import HoverTool

from natsort import natsorted
from pathlib import Path
import numpy as np
import shutil
import glob
import os

hv.extension('bokeh', config=dict(image_rtol=1000),logo=False)
hover = HoverTool(mode = 'vline')

/*  schedule compute resources */

N_channels=1_200
N_steps_per_file=25000
N_files=500
data = da.random.normal(10, 0.1, size=(N_channels, N_files*N_steps_per_file), chunks=(N_channels,N_steps_per_file))

dset = xr.Dataset({"data": (("Channel","time"), data)},
    coords={
        "Channel": np.arange(N_channels),
        "time": np.arange(N_steps_per_file*N_files)
    })
dset.time.attrs["units"]='seconds'

hv_ds = hv.Dataset(dset)
image =hv.Image(hv_ds, kdims=["time","Channel"]).opts(colorbar=True,width=1200,height=500,cmap='jet',title='Data Field',invert_yaxis=True)

image=image.persist()

rasterImage = rasterize(image)
rasterImage

If anyone could provide some insight as to why this is happening, it would be greatly appreciated.

When you call .persist() it will load the entire dataset into memory. Can you try it without the persist call?

I am getting the same result. The documentation says raster should be supported with Xarray + DaskArray, but for whatever reason the process is being serialized at the rasterize() call.

For those curious, here are the relevant library versions we are working with:

bokeh 2.3.3
dask 2022.9.1
dask-jobqueue 0.8.0
datashader 0.13.0
holoviews 1.14.3
xarray 0.16.2
numpy 1.19.5

1 Like