Access/Extract the selected data points using 'selection expression' and '.filter()'

Shahrokhkh · September 7, 2023, 2:00am

In my quest to create a rasterized plot of around 10M GPS data points with ‘link selection’ and selection feature (date range slider, and Box/Lasso selection tools) I followed the tutorials and examples presented in SciPy 2022 (hvPlot & HoloViz- James Bednar, Jean-Luc Stevens, Philipp Rudiger, Maxime Liquet | SciPy 2022) and I think I have got it working (see: “Hvplot, link selection, interactive() and Datashader: Is it possible?” in ‘Hvplot’ section of this discourse channel)

In order to go further with the analytical part of my project I need to access/extract the selected data points but it seems it is now an easy task. I have considered a solution presented here (“Using box_select to Return a Filtered Dataframe”, [Using box_select to Return a Filtered Dataframe - #2 by Marc]) but I want to know how to do this using ‘selection expression’ and “.filter()” method.

I need to know how to (i) programmatically find the number of points selected, (ii) find the number of unique values (‘userIds’ for example) in the selected area, and (ii) calculate the sum of the values for some features of the selected points.
Any suggestions or pointers are welcomed.

johann · September 18, 2023, 10:19am

Not sure where exactly you get stuck.

There is some useful documentation in Holoviews:
https://holoviews.org/user_guide/Large_Data.html

With the large set of datapoints (10m) you likely need to use the datashader or something similar to get any meaningful graphical output.

Anyway, I have only played with it once to integrate into my app to provide support for filtering down a Tabulator based on selections on several plots. Below a slightly streamlined version of an adhoc custom graphical filter I use in my app. Requires holoviews,
For lasso-select to work, you’ll also need to install shapely and datashader.

import pandas as pd
import numpy as np

import param

import panel as pn
import holoviews as hv
from  holoviews.operation.datashader import rasterize

pn.extension()

class FilterUI(param.Parameterized):

    # user input parameter
    df = param.DataFrame(doc='Complete Dataset')
    
    # output stuff
    df_selected = param.DataFrame(doc='Selected Dataset')
    
    ui = param.Parameter(default=None,doc='UI Pane')
    ui_output = param.Parameter(default=None, doc='UI Output Pane')
    
    link_selections_instance = param.Parameter(default=None, doc='Link Selection Instance')
    
    def view(self):
        
        # we always recreate the UI from scratch
        if self.ui is None:
            self.ui = pn.Column()
        else: 
            self.ui.clear()

        # reset what's selected
        self.df_selected = self.df
            
        # create our plot
        plot_opts = {'tools': ['hover', 'box_select','lasso_select'], 
                     'active_tools': ['box_select'], 
                     'width': 600, 'height': 600}
        
        plot = hv.Points(self.df, ['x', 'y']).opts(**plot_opts)
        
        if len(self.df) > 1000: 
            plot = rasterize(plot).opts(colorbar=True, **plot_opts)

        # create our output placeholder container. 
        self.ui_output = pn.Column('Nothing selected yet')

        # create a selection instance if not already done. 
        # Avoid having 
        # the @param.depends trigger at this stage. There are other ways to do this, 
        # like using low level API self.link_selections_instance.param.watch() ...)
        if self.link_selections_instance is None:
            with param.parameterized.discard_events(self):
                self.link_selections_instance = hv.link_selections.instance()
                
        # create a few widgets to control some of its settings on the link_selections_instance
        self.link_selections_instance = hv.link_selections.instance()
        
        ls_widgets = pn.Param(
            self.link_selections_instance, parameters=[
               # 'cross_filter_mode',   .... only useful if multiple plots
                'selection_mode'], name='Linking Options')
        
        # add the plot to a layout 
        layout = hv.Layout([plot])
      
        # Add widgets and the Plot-Layout to the UI Pane
        self.ui.extend([
            pn.Row(ls_widgets, self.link_selections_instance(layout)), 
            self.ui_output]) 
        
        return self.ui
    
    @param.depends('link_selections_instance.selection_expr', watch=True)
    def cb_link_selections(self, event=None):
        print(f'cb_link_selections()')
        
        # filter the DF with the selection criteria
        filter_expr = self.link_selections_instance.selection_expr
        self.df_selected = self.link_selections_instance.filter(self.df, filter_expr)

        # prepare and push some output stats/info  
        self.ui_output.clear()
        self.ui_output.extend(
            [pn.pane.DataFrame(self.df_selected, max_rows=10), 
             pn.pane.Str(f'Shape: {self.df_selected.shape}'), 
             pn.pane.Str(f'Selection Expr: {filter_expr}')
            ])

        #self.ui_html_output.object = self.df_selected.to_html(max_rows=10)

# now run a test

entries = 100000
df = pd.DataFrame({'x': np.random.normal(loc=0.5, scale=0.2, size=entries),
                   'y': np.random.normal(loc=0.2, scale=0.4, size=entries),
                  'users': np.random.choice(list('abcdefg'), entries)})

test = FilterUI(df=df)
test.view().show()

regards
Johann