Select Tool on Scatter Plot

cdtennant · September 30, 2022, 8:05pm

I realize this is a noob question, but I’m trying to find a simple example of generating a scatter plot with the selection tool capability (so that it can be used to filter the associated dataframe). I feel like I’ve seen multiple approaches to this, but they always are more complicated examples (i.e. I took the KDD tutorial on Holoviz and the select tool was in the context of linked plots). Any help is greatly appreciated!

Marc · September 30, 2022, 8:33pm

Hi @cdtennant

Welcome to the community.

Could you provide a minimum reproducible code example of what you would like to do? It really helps the community better understand your specific question and saves time when trying to answer your question. Thanks

cdtennant · September 30, 2022, 10:56pm

That’s partly the problem. I’ve only been exposed to the select tool in the context of interlinked plots. I’m trying to understand how I can use it for a standalone plot. Here is what I know works:

from bokeh.sampledata.autompg import autompg_clean as df
mpg    = df.hvplot.scatter('mpg',    'hp', width=300, height=350)
weight = df.hvplot.scatter('weight', 'hp', width=300, height=350)
ls = hv.link_selections.instance()
ls(mpg + weight)

But how to get the functionality of the select tool for a single (non-linked) scatter plot?

Jan · October 2, 2022, 7:40pm

I am not sure how to directly read out the selected points of a scatter plot. Would be nice to know how to do that, if at all possible. The way I did it here is to connect a selection stream to the plot and then read out the index parameter of that stream.

df = pd.DataFrame(np.random.randn(50,2))

scatter = hv.Scatter(df.values).opts(tools=['lasso_select'])
sel = hv.streams.Selection1D(source=scatter)
scatter

Then select values:

df.iloc[sel.index]

But you have to be careful, because sel.index will give an empty list if you don’t use the lasso selection tool at all. In order to avoid that, you would have to give sel initial values, like this:

sel = hv.streams.Selection1D(source=scatter, 
                             index=list(range(len(df))) )

Jan · October 2, 2022, 7:45pm

Ironically using a link_selections object on a single plot isn’t much longer in terms of code:

df = pd.DataFrame(np.random.randn(50,2))

scatter = hv.Scatter(df.values)
ls = hv.link_selections.instance()
ls(scatter)

Then select values:

ls.filter(df)

There are two advantages with this:

ls doesn’t start out with an empty list
You don’t need to explicitely add the lasso_select and box_select tool.

cdtennant · October 3, 2022, 11:20am

This is exactly what I was looking for! Very much appreciated Jan.

Jan · October 4, 2022, 8:35pm

To be honest, I find the second solution a little hacky. So I technically would prefer the first solution.