Hover tool with categorical aggregates

Theom · April 15, 2021, 11:28am

Hello,

I have successfully used holoviews.util.Dynamic to retain the ability to have a dynamic hover tool on coarsely aggregated data with datashader, as explained on the Interactivity page. That’s really great.

Building on this I am currently trying to display a tooltip that would show information based on categorical data, but I don’t know if it’s currently possible.

More precisely I’d like to show the category or categories of the points located in the hovered bin. Below is a snippet where I attempt to do so using the example provided in the datashader docs. Aggregates can only work on numerical data so I have an additional column which contains category numbers as floats. I use ds.by to specify that I want the aggregate to work separately based on df['cat'] values. Then the reduction here is ds.min but I could as well use ds.max since the value is the same (ds.first would be faster I guess but it yields an error saying it is only implemented for rasters).

import pandas as pd
import numpy as np
import datashader as ds
import holoviews as hv
import holoviews.operation.datashader as hd
from holoviews.streams import RangeXY
from collections import OrderedDict as odict

hd.shade.cmap=["lightblue", "darkblue"]
hv.extension("bokeh", "matplotlib") 

num=100000
np.random.seed(1)

dists = {cat: pd.DataFrame(odict([('x',np.random.normal(x,s,num)), 
                                  ('y',np.random.normal(y,s,num)), 
                                  ('val',val), 
                                  ('cat',cat),
                                  ('cat_number', cat_number)]))      
         for x,  y,  s,  val, cat, cat_number in 
         [(  2,  2, 0.03, 10, "d1", 1), 
          (  2, -2, 0.10, 20, "d2", 2), 
          ( -2, -2, 0.50, 30, "d3", 3), 
          ( -2,  2, 1.00, 40, "d4", 4), 
          (  0,  0, 3.00, 50, "d5", 5)] }

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")

points = hv.Points(df.sample(10000))
pts = hd.datashade(points, width=400, height=400)
dynamic = hv.util.Dynamic(hd.aggregate(points, aggregator=ds.by('cat', ds.min('cat_number')),
                                       width=24, height=24, streams=[RangeXY]), operation=hv.QuadMesh) \
          .opts(tools=['hover'], alpha=0, hover_alpha=0.2)

(pts * dynamic).relabel("Dynamic square hover")

The error I get is the following : DataError: None of the available storage backends were able to support the supplied data format. And ds.min by itself works, it’s the categorical part that doesn’t.

One solution would be to overlay as many aggregates as categories (like advised in this issue), each with its own hover tool, but in my experience the performance takes a hit when I superimpose several datashades. Plus the code would be more convoluted in my case.

I don’t know if this use case has been anticipated, or maybe I’m doing something wrong. Thanks in advance for the help.

As a side note, I have a custom tooltip already showing the local average of some value. I’d like to add the category field to this tooltip, so I intend to use ds.summary to create several reductions but I can’t get it to work, even with a reduction that works by itself (e.g. ds.summary(min=ds.min(cat_number)).

Software info:

datashader 0.11.1 or 0.12.1
holoviews 1.13.4 or 1.14.3
bokeh 2.2.3 or 2.3.0
context: within notebook or bokeh server (panel 0.10.3 or 0.11.2)

jbednar · April 16, 2021, 3:20am

Some notes:

That’s correct; ds.first is definitely only implemented for rasters.
For categorical values, what I would love to see in a hover is a histogram plot by category. Bokeh will soon have the ability to do this, as you can see in https://github.com/bokeh/bokeh/pull/11165 . But that’s not yet been merged and it may be some time before it appears in a release.
Here, you probably don’t need to use QuadMesh anymore; Bokeh now supports hover for an overlaid transparent hv.Image instead, which will give you pixel-by-pixel information.
I’m not quite sure what you are meaning for the aggregator to return; did you want the smallest numerical category value present in that pixel? That doesn’t sound very useful; seems like you’d want the most common category in that pixel, if you can’t have the full histogram. In any case I’m not quite sure how you’d achieve either one, though I do think it should be possible.

Sorry I couldn’t be of more help!

Theom · April 16, 2021, 8:46am

Thanks for your response.

I was hoping to see the list of categories located in the hovered bin, as the datashader docs for the ds.by aggregator says “Resulting aggregate has an outer dimension axis along the categories present.” Then I would have combined it with a ds.min as every point within a category has the same cat_number value, and ds.first, ds.last and also ds.mode are only available for rasters.

Ok I did not know about the change regarding Bokeh. It works well indeed with the QuadMesh removed. Maybe it would be worth updating the Interactivity page to account for this new solution!

As for the fact that ds.summary is not supported in hd.aggregate, is it normal or could it be otherwise? When trying the following (for instance with only one reduction), I get AttributeError: ‘summary’ object has no attribute ‘column’.

hover = hv.util.Dynamic(hd.aggregate(points,
                                     aggregator=ds.summary(mean=ds.mean('var')),
                                     streams=[hv.streams.RangeXY])).opts(tools=['hover'], alpha=0)```

jbednar · April 18, 2021, 5:16pm

Yes, please do update the Interactivity page to mention that the Quadmesh approach is not required any more.

I still don’t quite get what ds.min would do for you here, because very often in a dense region of the plot every category will be represented in every pixel, and thus the min will be the same for every such pixel. E.g. for the Census datashader example, you’d get the same min value for a pixel with 1000 Whites, 500 Blacks, and 10 Hispanics as for one with 1 White, 2 Blacks, and 10000 Hispanics, right? I may just be misunderstanding what you’re after, though.

I’m pretty sure no one has thought much about summary aggregates for holoviews at all, so it is very likely we would need to do some work to support it. We also haven’t done anything about supporting hover for categorical values, because Bokeh doesn’t currently support Datashader-style categorical colormapping, and so we have to use hv.RGB output that wouldn’t have any meaningful hoverability. So there’s definitely important work. to do here, but I don’t think any of the primary HoloViews maintainers are currently planning to look into that.

Theom · April 19, 2021, 8:59am

Ok sure, I can submit a PR for the interactivity notebook regarding hover tools and QuadMesh. Note however that when not using QuadMesh, it looks like I lose the opportunity to get a working custom tooltip (I’m getting ??? instead of the value, which is displayed correctly with QuadMesh).

As I said by using ds.by coupled with the ds.min (or ds.max) reduction I was hoping to get some kind of list of all categories represented in the bin since the docs says the ds.by aggregator has a dimension along the categories. In my use case, the data is distributed so that only a very small number of categories would be present in any area of the plot, typically less than three out of a dozen categories. But unfortunately the ds.by aggregator does not seem to work with holoviews.operation.aggregate in any case.

Ok thank you for the insight on ds.summary. As a consequence it seems it is not possible to have a tooltip containing information on several aggregates (e.g. mean and std). I’ve tried overlaying two transparent aggregates, each with its own hover tool, but I only get the tooltip for the last element overlayed.

I really hope these features will be available in the future although I understand this may entail significant work and may not necessarily be high-priority. Datashader is really helpful in my case and it’s nice to see increasing integration with interactive tools.

jbednar · April 19, 2021, 2:43pm

To break this down, what if you specified aggregator=ds.min('cat_number')? If you ran that, it should calculate the minimum cat_number value encountered in each pixel, i.e. 1 if that pixel contains something from category “d1”, else 2 if it has something from “d2”, etc. That’s the minimum cat_number in the pixel as requested, but it doesn’t seem useful, right?

But it’s even less useful if you do the same thing but tell it to aggregate those min values categorically, with your code aggregator=ds.by('cat', ds.min('cat_number')) . You’re then finding the minimum cat_number for all values from category “d1” (which will always be 1 if it’s not NaN due to having no value of that type in that pixel) and the same separately for each category. Thus for each pixel, you’ll have a list with one value per category, with each value being either the cat_number of that category, or NaN if nothing from that category was in that pixel. It’s hard to see how that calculation could be useful.

Here I would have assumed you wanted just the counts per category, per pixel, which is just aggregator=ds.by('cat'), and then you wanted to reveal those counts (as a histogram or as a list) when you hover over the pixel. That last bit is what I don’t know how to do in Bokeh, but presumably it can be done with a custom HoverTool that inspects the aggregate and reports it as a list.

Theom · April 19, 2021, 4:11pm

It’s hard to see how that calculation could be useful.

Well in my use case, it would be! I do want to know which categories I have in my pixel or bin, and maybe display other information based on these categories.

Some details about what I’m doing might help here. I use datashader to plot a large number of satellite data points covering the whole planet. I understand this may not be a typical use of datashader but it’s one where it really helps. Its ability to dynamically update the image when exploring different geographical scales while plotting every single point is very valuable. The thing is there are areas where data points from different satellite orbits overlap, therefore I want to know for any area on the map which orbits cover it, so that the user can if needed dynamically filter out unwanted orbits. Here my categories are the orbit numbers.

I have tried setting aggregator to ds.by(‘cat’), but it does not work (it says the reduction is missing) and with ds.by(‘cat’, ds.count()) or ds.count_cat(‘cat’) you get the error reported in my first post. Therefore it seems there might be something strange happening in holoviews.operation.aggregate with respect to categorical aggregators.

In any case, if there’s a way to investigate this I would be happy to help.

jbednar · April 20, 2021, 8:20pm

Well in my use case, it would be! I do want to know which categories I have in my pixel or bin, and maybe display other information based on these categories.

Sure. But ds.min isn’t going to give you that. I think it’s clear from your responses that you want to know counts per pixel (some of which may be zero), not a minimum category number or some other value that min could provide.

Ah, I would have expected count to be the default in that case; we should make it so!

Here ds.by(‘cat’, ds.count()) is what I’d suggest, and yes, at this point something has to be done with the fact that it’s a stack of aggregates, one per category. So the task to do here is to take this stack of aggregates, pull out a list with the counts by category, and display that. That’s the missing code here, which doesn’t seem difficult in principle but seems difficult enough to figure out that I’m not adding it to my plate at the moment; too many other undone things on it! Good chance for someone else to jump in and figure out how to hover that, particularly now that Bokeh will be able to display it reasonably (though for now a list of category:count pairs is surely already useful).

jbednar · April 24, 2021, 3:46am

I made a PR to make count the default by reduction as suggeted above: Made ds.by use a default reduction of ds.count() by jbednar · Pull Request #1004 · holoviz/datashader · GitHub

Theom · April 26, 2021, 9:02am

Alright, that’s good!

It’s great to know that Bokeh will be able to provide more sophisticated tooltips in the future.

As for my use case, counts for me are secondary although having counts per category would do the job as well. Having what you described as being the result of ds.by(‘cat’, ds.min(‘cat_number’)) (or ds.by(‘cat’, ds.max(‘cat_number’)) or ds.by(‘cat’, ds.first(‘cat_number’)) for that matter) displayed is exactly what I want:

“Thus for each pixel, you’ll have a list with one value per category, with each value being either the cat_number of that category, or NaN if nothing from that category was in that pixel.”

For now I can use two separated tooltips based on two overlayed hd.aggregate, one with ds.min and one with ds.max to ensure I know what are the first and last orbits represented in the pixel/bin.

wietze · April 29, 2021, 9:51pm

Related question, how to hover over integer values in xarray_da.hvplot.image(rasterize=True) and show labels that represent the integers?