Hi,
I just started to experiment with Datashader myself, but I’m pretty sure this is exactly the use-case it was created for. I’ve just read through every piece of documentation as I’ve tried to use it for something completely different (displaying gridded data on a map) and my experience was that most examples are about something similar to what you talk about.
I think the main question isn’t whether DS is the right tool but that what is it exactly you really want to display. Usually, when you want to display 1M+ data points you don’t actually want to display every one of them, as they would have to be either way smaller than a pixel or they would create an overlapping blurry blob. So most probably you should display some kind of aggregate (at least when zoomed out) instead, like the count of the points at a specific location of some kind of heatmap etc.
For me, this was one of the most important articles that made me understand how Datashader can solve this issue in multiple ways depending on your needs:
https://datashader.org/user_guide/Plotting_Pitfalls.html
There are other good examples here:
https://datashader.org/getting_started/Pipeline.html
Don’t get scared if it looks way more technical than you expected. You don’t need to understand every possible way to use it, but if you see something you like, you can dig down there and figure out how that specific thing works.
I would, however, recommend using Holoviews for your plotting. It’s a high-level tool, that lets you create charts that would be very difficult otherwise, and it integrates well with both Bokeh and Datashader, practically holding them together as a team leader. This guide literally starts with showcasing millions of random data points, so after reading a bit above the theory above, this could be a could starting point for the actual work:
http://holoviews.org/user_guide/Large_Data.html
When it comes to the hover tool… It’s a similar question. Do you really want a separate hover tooltip for every one of the millions of points on a single chart? Most probably not as it would be practically impossible for the user to target any point. By aggregating the data for example for the count you can have a tooltip that tells you the number of points at each pixel (or you can put together something more detailed like adding together some values of each point at that specific location etc) as you can see it in some of the examples. As far as I understand it, when reducing your points into an image with Datashader you have two main options.
- You reduce it with raster(). You get a data-grid, containing some values for every pixel (like the number of points or the sum of some values, whatever you did) and you let Bokeh display that data grid with the help of a colormap. In this case, your tooltip has access to the actual data.
- You reduce it with shade() and a colormap. You immediately get an rgb image to display, but as it is an image, you can not show any information about the raw data in the tooltip.
I hope I understood this correctly, or somebody else will correct me if not.
So long story short, depending on your real data, I would figure out what it is that I actually want to display and I would use Holoviews to handle Datashader and Bokeh to do that, as that could cover all the zooming and tooltip issues too. Also, I’m pretty sure it’s possible to show a Datashader aggregate when zoomed out and to show individual Points (with detailed tooltips just for that one Point) when zoomed in. I hope t least because I will have to do something like that too in the next few days…