Overwhelmed by with Holoviews / hvplot / panel workflow permutations & concepts

mg3146 · April 21, 2024, 12:24am

Hi,

I’m trying to use Holoviews / hvplot / panel for a 1-off project I’m having to build for work. I’m finding myself so overwhelmed by the number of permutations between hvplot, Holoviews and panel in what I thought would be a fairly straight forward use case. I was hoping I could post a workable example here in the hopes of someone getting me started in the right direction…

Here’s what I’m trying to do…

My (very, very, small) user base primarily uses the ipython console in Spyder or VSCode when working with my python pacakge. We want to avoid jupyter notebooks. The ideal output would be, when the plotting function is called, is some kind of new popup window [like matplotlib does], otherwise a new microsoft edge [company specific thing] tab.
I’m looking to display time series data that can all be represented by pandas dataframes like the one I’ve created below. All x-axis are the same for all series.
I’d like the user to be able to select what time-series to view dynamically. Specifically, if you look at my multiindex columns:

* Level 1 :  User selects ONE of these
* Level 2:   User selects MANY (1+) of these, for a given Field 1
* Fields:     User selects MANY (1+) of these.

Lastly, I was hoping to create a toggle to first-difference the dataframe [ie,df.diff(1)]. I could also pre-populate this data in the dataframe [another index layer] if that simplified things. I was looking at the Operators documentation for this, but I couldn’t make heads or tails out of it…

I really appreciate any help! I think this is my 3rd or 4th time trying to integrate the holoviz library into packages I’ve built for work, but everytime I try I never seem to make much headway

import pandas as pd
import numpy as np
import holoviews as hv
import panel as pn

class DataViewer:
    def __init__(self):

        # Parameters
        start_date = '2023-01-01'
        end_date = '2023-04-10'
        freq = 'D'  # Daily frequency

        # Create a date range
        dates = pd.date_range(start=start_date, end=end_date, freq=freq)
        num_dates = len(dates)

        # Define the levels for the MultiIndex
        level_1_and_2 = [
            ('total', 'total'),
            ('by_sensor_type', 'sensor_type1'),
            ('by_sensor_type', 'sensor_type2'),
            ('by_sensor_type', 'sensor_type3'),
            ('by_sensor', 'sensor_1'),
            ('by_sensor', 'sensor_2'),
            ('by_sensor', 'sensor_3')
        ]

        fields = ['field_a', 'field_b', 'field_c']

        # Create the MultiIndex
        cols = pd.MultiIndex.from_tuples([(lvl1, lvl2, field) for lvl1, lvl2 in level_1_and_2 for field in fields],
                                         names=['Level 1', 'Level 2', 'Fields'])               

        # Initialize a DataFrame
        df = pd.DataFrame(index=dates, columns=cols)

        # Generating random walks and random walks with drift
        np.random.seed(0)  # for reproducibility
        for column in df.columns:
            drift = 0.1 if 'sensor_type1' in column or 'sensor_1' in column else 0
            random_walk = np.random.normal(loc=drift, scale=1, size=num_dates)
            df[column] = random_walk.cumsum()
        self.df = df        
        
    def to_awesome_plot(self) -> Union[a popout window that launches the chart, the chart object itself to stack ontop of another]
        # see below,  output the same
        pass
    
    def __add__(self, other):
        return DataViewerContainer(self, other)
    
    
class DataViewerContainer:
    def __init__(self, *args):
        self.data_viewers = args
        
    def to_awesome_plot(self):
        
        plots = [x.to_awesome_plot() for x in self.data_viewers]
        
        plotme = pn.Column(plots)  
        
        # Ideally plotme pops out in some standalone window!  Not jupyter inline. 
        # If there is a way to do this without a running server thats preferred,
        # but if not, thats ok. 
        
        #

Marc · April 22, 2024, 3:25pm

Hi @mg3146

Regarding the pop up.

You can make Panel components like Columns open pop up windows in a browser using the .show() method.

To not block you would use the argument threaded=True. When you don’t need the server any more you can stop it with server.stop().

The Column is a really good container to show because you can work with it like a list and use methods like .append to add any object that Panel can display and .clear if when you want that.

Example

In [1]: import panel as pn

In [2]: pn.extension()

In [3]: container=pn.Column()

In [4]: server = container.show(threaded=True)

Launching server at http://localhost:56755
In [5]: container.append("Text")

In [6]: container.append("Matplotlib Plot")

In [7]: container.append("DataFrame")

In [8]: server.stop()

In [9]: exit()

This make it easy to display objects when working in a Spyder interactive terminal. Here I show this using an ipython terminal.

Marc · April 22, 2024, 4:24pm

Regarding the DataViewer component here is an example that might help you. I call it DataStore.

import pandas as pd
import numpy as np
import panel as pn
import param
import datetime as dt
import pandas as pd
import hvplot.pandas

pn.extension(design="bootstrap")

def _get_data(start_date, end_date, freq):
    dates = pd.date_range(start=start_date, end=end_date, freq=freq)
    num_dates = len(dates)

    level_1_and_2 = [
        ('total', 'total'),
        ('by_sensor_type', 'sensor_type1'),
        ('by_sensor_type', 'sensor_type2'),
        ('by_sensor_type', 'sensor_type3'),
        ('by_sensor', 'sensor_1'),
        ('by_sensor', 'sensor_2'),
        ('by_sensor', 'sensor_3')
    ]

    fields = ['field_a', 'field_b', 'field_c']

    cols = pd.MultiIndex.from_tuples([(lvl1, lvl2, field) for lvl1, lvl2 in level_1_and_2 for field in fields],
                                        names=['Level 1', 'Level 2', 'Fields'])               

    df = pd.DataFrame(index=dates, columns=cols)

    np.random.seed(0)  # for reproducibility
    for column in df.columns:
        drift = 0.1 if 'sensor_type1' in column or 'sensor_1' in column else 0
        random_walk = np.random.normal(loc=drift, scale=1, size=num_dates)
        df[column] = random_walk.cumsum()
    return df

def _multiindex_2_dict(p: pd.MultiIndex|dict) -> dict:
    """
    Converts a pandas MultiIndex to a nested dict.

    We need a nested dict for Panels NestedSelect.
    """
    internal_dict = {}
    end = False
    for x in p:
        # Since multi-indexes have a descending hierarchical structure, it is convenient to start from the last
        # element of each tuple. That is, we start by generating the lower level to the upper one. See the example
        if isinstance(p, pd.MultiIndex):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used. Only for 2 levels
            # pd.MultiIndex
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = [x[-1]]
            else:
                internal_dict[t].append(x[-1])
        elif isinstance(x, tuple):
            # This checks if the tuple x without the last element has len = 1. If so, the unique value of the
            # remaining tuple works as key in the new dict, otherwise the remaining tuple is used
            if len(x[:-1]) == 1:
                t = x[:-1][0]
                end = True
            else:
                t = x[:-1]
            if t not in internal_dict:
                internal_dict[t] = {x[-1]: p[x]}
            else:
                internal_dict[t][x[-1]] = p[x]
    
    # Uncomment this line to know how the dictionary is generated starting from the lowest level
    # print(internal_dict)
    if end:
        return internal_dict
    return _multiindex_2_dict(internal_dict)

def _to_date(date: str|dt.datetime|dt.date)->dt.datetime:
    if isinstance(date, dt.date):
        return date
    return pd.Timestamp(date).date()
    


class DataStore(pn.viewable.Viewer):
    start_date: dt.datetime = param.CalendarDate(allow_None=False)
    end_date: dt.datetime = param.CalendarDate(allow_None=False)
    freq: str = param.Selector(default="D", objects=["B", "D", "W"])
    diff = param.Boolean(False)

    data = param.DataFrame()

    series = param.Tuple(allow_None=False)


    plot = param.Parameter()

    def __init__(self, start_date: str|dt.datetime, end_date: str|dt.datetime, freq: str="D"):
        start_date = _to_date(start_date)
        end_date = _to_date(end_date)

        super().__init__(start_date=start_date, end_date=end_date, freq=freq)

        self._layout = self._get_layout()
        self._server=None

    @param.depends("start_date", "end_date", "freq", "diff", watch=True, on_init=True)
    def _update_data(self):
        data = pn.state.as_cached("datastore-data", _get_data, ttl=15, start_date=self.start_date, end_date=self.end_date, freq=self.freq)
        
        if self.diff:
            data = data.diff(1)
        
        self.data = data

    @param.depends("data", watch=True)
    def _update_series(self):
        series = self.series
        if not series in list(self.data.columns):
            series = list(self.data.columns)[0]
            self.param.series.length=3
            self.series = series

    @param.depends("data")
    def _options(self):
        return _multiindex_2_dict(self.data.columns)
    
    @param.depends("data")
    def _levels(self):
        return self.data.columns.names

    @param.depends("data", "series", watch=True)
    def _update_plot(self):
        series = self.data[self.series]
        series.name = ", ".join(self.series) # hvplot needs Series to have a string name
        self.plot = series.hvplot()
        

    def _get_layout(self):
        settings_input = pn.Row(self.param.start_date,self.param.end_date,self.param.freq)
        series_input = pn.widgets.NestedSelect(options=self._options, levels=self._levels, layout=pn.Row)
        
        @pn.depends(series_input, watch=True)
        def _update_series_on_nested_select(nested_select_value):
            self.series = tuple(nested_select_value.values())
        
        return pn.Column(
            "## Period",
            settings_input,
            "## Series",
            series_input,
            "## Other",
            self.param.diff,
            "## Plot",
            pn.pane.HoloViews(self.param.plot, width=950),
            margin=25
        )

    def __panel__(self):
        return self._layout

DataStore(
        start_date='2023-01-01',
        end_date = '2023-04-10',
        freq = 'D'

).servable()

panel serve script.py

Take a look. Feel free to shoot more questions.

Marc · April 22, 2024, 4:26pm

Regarding overwhelmed. I think its a combination of things. The HoloViz ecosystem is huge and at the same time you are also trying to build a complex, reusable component which is something we teach intermediate users of Panel, c.f. our basic and intermediate tutorials.

mg3146 · April 22, 2024, 6:05pm

Hi @Marc ,

This is all incredibly helpful, thank you for sharing and breaking it down like this. I can definitely run with this as a schema now.

A couple smaller follow up questions.

self.plot = series.hvplot() – Can I replace this with just about any Holoviews / hvplot plot object [Layout etc]? For example, if I wanted to adapt your DataStore class such that the user can select many Level_2 items, I would probably want to use Holoviews to Overlay the charts? And if so, do I need to use that DynamicMap class?
I saw on discord that there is a soon-to-be-release for working with pandas MultiIndex. Do you think that will apply much to this workflow?
I also saw on Discord, from a bit ago, you were asking whether there was a way to line up the x-axis for a Layout that contained a hv.Curve and hv.Bar (you had the bar chart sitting below the hv.Curve). At the time, the answer was no, but I was wondering if that ever changed? And if not, were you able to figure out a workaround?

Thanks again, seriously. This has been a massive help.

coderambling · April 22, 2024, 9:23pm

The Holoviews Pandas multi-index support Github issue (solved with new code recently) on Github for reference.

@mg3146 maybe include that link in your follow-up question?

The code is available now, and being tested, so if you’re feeling brave you could test it

mg3146 · April 22, 2024, 10:35pm

ha, I’m not sure I’m that brave. I did consider it, but I didn’t want to embarrass myself in asking how to pull an unofficial release from git. I also wasn’t really sure if it would mess up hvplot…

mg3146 · April 23, 2024, 12:19pm

One other follow up questions, as I try to go from 0-60 on this…

Lets say that _get_data is an async generator function. Are there any easily solutions to implement that so that the DataStore & corresponding items update nicely and still run in a thread?

async def _get_data(start_date, end_date, freq):
    dates = pd.date_range(start=start_date, end=end_date, freq=freq)
    num_dates = len(dates)

    level_1_and_2 = [
        ('total', 'total'),
        ('by_sensor_type', 'sensor_type1'),
        ('by_sensor_type', 'sensor_type2'),
        ('by_sensor_type', 'sensor_type3'),
        ('by_sensor', 'sensor_1'),
        ('by_sensor', 'sensor_2'),
        ('by_sensor', 'sensor_3')
    ]

    fields = ['field_a', 'field_b', 'field_c']

    cols = pd.MultiIndex.from_tuples([(lvl1, lvl2, field) for lvl1, lvl2 in level_1_and_2 for field in fields],
                                     names=['Level 1', 'Level 2', 'Fields'])               

    df = pd.DataFrame(index=dates, columns=cols)

    np.random.seed(0)  # for reproducibility
    for column in df.columns:
        drift = 0.1 if 'sensor_type1' in column or 'sensor_1' in column else 0
        random_walk = np.random.normal(loc=drift, scale=1, size=num_dates)
        df[column] = random_walk.cumsum()

    chunk_size = len(df) // 10
    for i in range(0, len(df), chunk_size):
        yield df.iloc[i:i + chunk_size]
        await asyncio.sleep(1.5)  # wait for 1.5 seconds before the next yield

# Example usage
async def main():
    async for chunk in _get_data('2023-01-01', '2023-01-10', 'D'):
        print(chunk)  # process each chunk as it arrives

coderambling · April 24, 2024, 3:18pm

I think pip install with version parameter set to the right version should do the trick…

pip install package_name==version_number.

mg3146 · April 24, 2024, 4:14pm

nice, i’ll give that a shot. tx!