Analyzing time series data for quick feedback

SVP1194 · October 7, 2022, 3:56pm

Hello Everyone!

I am working with some time series data (around 5k rows) from my simulations. I am trying to build a quick explorer tool adding functionalities have drop downs for selecting the data, selecting x,y, type of plots etc. and also be able to generate a grid plot where I can select rows, columns of the grid and update/append plots accordingly. Are there good references to my above mentioned requirements? Maybe it is a recipe of different examples that I can combine to build something from scratch. I came across this good example which fits most of the things I want to do https://panel.holoviz.org/gallery/simple/hvplot_explorer. Is this a reliable tool for a fast feedback or needs more improvements in terms of performance? Also, could anyone please share a link to the source code so that I can add more functionalities.

Marc · October 7, 2022, 4:59pm

Hi @SVP1194

hvPlotExplorer is really easy to try out. See Explorer . So just start out with that. If it solves your problem then fine.

If not then either suggest improvements on the hvplot github or build something custom your self.

If you go with custom then feel free to post your questions with minimal code examples here. Please also include data. That will make it easy for the community to try to help you.

SVP1194 · October 13, 2022, 1:57pm

Hi @Marc

I would definitely prefer to build something custom as I am planning to add more functionalities and targeting motivate other teams to get into this ecosystem in my company. I am a bit stuck for now with this code below. My data frames are stored in a dictionary (trace_dict) where each key has a run id of the simulation and corresponding is a data/data frame.

I would like to have a dropdown for both X,Y axes which are nothing but my columns of the data frame and be able to interact with the plot based on my selection
I would like to have a dropdown on my data. I have created an object ‘data_selector’ for that.
It would also be nice if you could indicate how I can select between plots like scatter,line, bar etc. and update it in the plot. I have tried some things here by wrapping inside a class with param.

import holoviews as hv
import numpy as np


import param
from holoviews import opts, streams
from holoviews.operation.datashader import regrid
from holoviews.plotting.links import DataLink

import panel as pn

hv.extension('bokeh')

#time series dataframes stored in a dictionary where each key is corresponding run id of the simulation

trace_dict = {}
for run_id in workspace.id:
    trace_data = ftx_f.get_data_run_trace(run_id, save_df=True) #calling function to fetch data from server and store it as a pandas dataframe
    trace_dict[run_id] = trace_data #storing each dataframe in a dictionary {'id of the run': dataframe} 

datasets = trace_dict 

class Visualizer(param.Parameterized):
                

    # dropdown for data runs 
    dataset_selector = param.ObjectSelector(default=list(datasets.keys())[0],
                                   objects=list(datasets.keys()))
    #dropdown for plot type
    plot_type = param.ObjectSelector("bar", objects=["bar", "scatter"], label="Plot Type")
    
    #list of columns in dataframe
    columns = [x for i,x in enumerate(trace_data.columns) if x!='Unnamed: 0']
    
    widget_x = pn.widgets.Select(name='x', options= columns)
    widget_y = pn.widgets.Select(name='y', options= columns)
    
    
    
    @param.depends('dataset_selector')
    def _inputs(self):
        name = self.dataset_selector
        datum = datasets[name]
        
        self.columns = x_field
        self.columns = y_field
        self.widget_x = widget_X
        self.widget_y = widget_Y
           
    @param.depends('dataset_selector')        
    def plot_trace_signals(self):
        curve_ds_analyse = hv.operation.datashader.datashade(
        hv.Curve(
            data = self.datum,
            kdims= self.x_field,
            vdims = self.y_field
        ),
        aggregator='any',
        ).opts(
        frame_height=300,
        frame_width=625,
        show_grid=True,
        )
        return curve_ds_analyse
    
#     @param.depends(widget_X, widget_Y)
#     def get_plot(self):
        
#         return plot_trace_signals(self)
    

visualizer = Visualizer()
pn.Row(pn.Column(visualizer.param), visualizer._inputs,plot_trace_signals)

SVP1194 · October 17, 2022, 9:28am

An update here: I managed to add some features like drop downs and multiselect but seems like I get no plot out of it. I unwrapped it out of param class method to test things and plot seems to work pretty good based on my selections but I would like to keep it within param class method so that I learn more about it. I am stuck with two things for now

Create the plot according to my interactions with the widget
Include data_selector to update the plot. I see that when I choose a different data, there is no update on the plot

Any inputs are welcome

import holoviews as hv
import numpy as np


import param
from holoviews import opts, streams
from holoviews.operation.datashader import regrid
from holoviews.plotting.links import DataLink

import panel as pn

hv.extension('bokeh')

#time series dataframes stored in a dictionary where each key is corresponding run id of the simulation

trace_dict = {}
for run_id in workspace.id:
    trace_data = ftx_f.get_data_run_trace(run_id, save_df=True) #calling function to fetch data from server and store it as a pandas dataframe
    trace_dict[run_id] = trace_data #storing each dataframe in a dictionary {'id of the run': dataframe} 

datasets = trace_dict

class Visualizer(param.Parameterized):
    
    
    datasets = trace_dict

    dataset_selector = pn.widgets.Select(name="Select Data",  options= list(datasets.keys()))
    
    cols_X = signals_list_trace_data
    cols_Y = signals_list_trace_data


    widget_x = pn.widgets.MultiSelect(name='x', options= cols_X)
    widget_y = pn.widgets.Select(name='y', options= cols_Y)
    

    plot_type = pn.widgets.Select(name="Plot Type", value="line", options=["line", "bar"])
    
    def __init__(self, **params):
        
        super(Visualizer, self).__init__(**params)
        self.df = trace_data
        self.x_field = cols_X
        self.y_field = cols_Y
        self.plot_type = plot_type
        
        
        
    @pn.depends('plot_type') 
    def get_plot_analyze(self):

        if self.plot_type == "line":

            curve_ds_analyze = self.df.hvplot.line(x=self.x_field, y=self.y_field)

            return curve_ds_analyze

        elif self.plot_type == "bar":

            bar_ds_analyse = self.df.hvplot.bar(y=self.y_field) 

            return  bar_ds_analyse


    #@pn.depends('get_plot_analyze')
    def get_plot(self):

        return self.get_plot_analyze()
    
    def view(self):
        # This method should construct the entire view for the tab
        # that this class is responsible for. 
        return pn.Row(pn.Column(self.widget_x,self.widget_y, self.plot_type), self.get_plot)


visualizer = Visualizer().view()

visualizer

carl · October 17, 2022, 10:22am

Hi @SVP1194,

You might be able to take some influence from here; it lets you load in multiple data sets and switch between them and I’m sure from there you’ll be able to return your desired graphs with some further editing:

https://discourse.holoviz.org/t/a-nice-simple-dashboard-to-view-data-files/3315

Hope it helps, Carl.

SVP1194 · October 18, 2022, 11:29am

Thank you @carl I tried to run your code as it is. I tried to upload a csv and it seems to not work for some reason. I do not see any changes on ‘x’ and ‘y’ data. Please let me know if I am miss anything

carl · October 18, 2022, 1:15pm

try removing this line its probably failing on this as its looking for something specific to my csv at this point

dataset.value['Date'] = pd.to_datetime(dataset.value['Date'], format='%d/%m/%Y %H:%M') #used instead of pandas read_csv parse_dates ***pay attention to file datetime format

SVP1194 · October 18, 2022, 3:00pm

Thanks @carl It seems to work. I am trying to apply the same to my use case. The ‘x’ and ‘y’ does not populate though. I am suspecting the reason is that I my data frames are stored in a dictionary. So I should be able to access and declare them in a right syntax in this line?

dataset = Dataset(value = self.my_dict[new_dataset_input.value], name= 'Dataset')