[DataShader+Bokeh] Showing huge images

rynkk · April 17, 2021, 4:23pm

Hi there,

after I came to the realization that it is naive to want to render a ~1.000.000+ datapoints image in bokeh I stumbled upon datashader.

Now, I did a bit of digging and tried out some examples but I’m not entirely sure if my use-case is within the scope of DS, if yes I could use nudging in the right direction.

I am creating a bokeh-image like in the example below. However I want it to downsampled by datashader and if zoomed in, load in the more fine-grained image-details (same if panning while zoomed in).

Would that work? What would be the starting points to look at to achieve this? Can I still use e.g., a HoverTool to show the actual underlying data, not the downsampled one or modify the palette directly with bokeh?

Thanks in advance

Example

import numpy as np
p_y = 2000
p_x = 2000
img_data = np.random.rand(p_x, p_y)
print(img_data)
source = ColumnDataSource(data=dict(image=[img_data]))
# use level for hierarchy
cm = LinearColorMapper(palette=turbo(256), low=0, high=1)
x_range = DataRange1d(start=0, end=p_x, bounds=(0, p_x), min_interval=1, range_padding=0)
y_range = DataRange1d(start=0, end=p_y, bounds=(0, p_y), min_interval=1, range_padding=0)
plot = figure(toolbar_location="above", tools=["wheel_zoom,pan"],
              x_range=x_range, y_range=y_range, aspect_ratio=1,
              width=500, height=500)

img = plot.image(source=source, x=0, y=0,
                 dw=p_x, dh=p_y,
                 color_mapper=cm)

doc.add_root(plot)

SteveAKopias · April 19, 2021, 12:41am

Hi,

I just started to experiment with Datashader myself, but I’m pretty sure this is exactly the use-case it was created for. I’ve just read through every piece of documentation as I’ve tried to use it for something completely different (displaying gridded data on a map) and my experience was that most examples are about something similar to what you talk about.

I think the main question isn’t whether DS is the right tool but that what is it exactly you really want to display. Usually, when you want to display 1M+ data points you don’t actually want to display every one of them, as they would have to be either way smaller than a pixel or they would create an overlapping blurry blob. So most probably you should display some kind of aggregate (at least when zoomed out) instead, like the count of the points at a specific location of some kind of heatmap etc.

For me, this was one of the most important articles that made me understand how Datashader can solve this issue in multiple ways depending on your needs:
https://datashader.org/user_guide/Plotting_Pitfalls.html

There are other good examples here:
https://datashader.org/getting_started/Pipeline.html
Don’t get scared if it looks way more technical than you expected. You don’t need to understand every possible way to use it, but if you see something you like, you can dig down there and figure out how that specific thing works.

I would, however, recommend using Holoviews for your plotting. It’s a high-level tool, that lets you create charts that would be very difficult otherwise, and it integrates well with both Bokeh and Datashader, practically holding them together as a team leader. This guide literally starts with showcasing millions of random data points, so after reading a bit above the theory above, this could be a could starting point for the actual work:
http://holoviews.org/user_guide/Large_Data.html

When it comes to the hover tool… It’s a similar question. Do you really want a separate hover tooltip for every one of the millions of points on a single chart? Most probably not as it would be practically impossible for the user to target any point. By aggregating the data for example for the count you can have a tooltip that tells you the number of points at each pixel (or you can put together something more detailed like adding together some values of each point at that specific location etc) as you can see it in some of the examples. As far as I understand it, when reducing your points into an image with Datashader you have two main options.

You reduce it with raster(). You get a data-grid, containing some values for every pixel (like the number of points or the sum of some values, whatever you did) and you let Bokeh display that data grid with the help of a colormap. In this case, your tooltip has access to the actual data.
You reduce it with shade() and a colormap. You immediately get an rgb image to display, but as it is an image, you can not show any information about the raw data in the tooltip.
I hope I understood this correctly, or somebody else will correct me if not.

So long story short, depending on your real data, I would figure out what it is that I actually want to display and I would use Holoviews to handle Datashader and Bokeh to do that, as that could cover all the zooming and tooltip issues too. Also, I’m pretty sure it’s possible to show a Datashader aggregate when zoomed out and to show individual Points (with detailed tooltips just for that one Point) when zoomed in. I hope t least because I will have to do something like that too in the next few days…

rynkk · April 20, 2021, 7:50pm

Hey,

thanks for your little introduction to datashader, I appreciate it. I will definitely check out raster. That should allow precise hovering, as well.

I’m gonna work myself through the early difficulties and hope it works out nicely. If I actually make decent progress I’ll be sure to let you know, since you seep to have the same problem.

Good luck and thanks again