Streaming datashader seems extremely buggy

pepijndevos · April 1, 2022, 9:19am

I’m not sure if this should be one bug report or many, and how to best provide reproducers, so I’m posting here first to workshop the problem into something the maintainers can hopefully work with.

Bug 1: plot bounds don’t update when new data arrives. You can mess with the streams, but then it no longer rerenders when you zoom in.

Bug 2: NdOverlay can’t have zero traces, or it’ll give TypeError: issubclass() arg 1 must be a class

Bug 3: Sometimes calling .clear() on the Buffer after it has been plotted causes IndexError: iloc cannot enlarge its target object on the Holoviews side.

Bug 4: Plotting a single row can cause ValueError: cannot convert float NaN to integer when using the count_cat aggregator.

Bug 5: If you try to plot different columns from the dataframe it throws KeyError: "Dimension('foo') not found."

Bug 6: I’m probably forgetting stuff I’ve already worked around somehow…

All of these issues can be reproduces by fiddling around with the following code a bit. Use a different aggregator, plot a bit of data, plot a bit more data, clear the data, select different traces, etc.

import holoviews as hv
import datashader as ds
import pandas as pd
import numpy as np
from holoviews.streams import Buffer, Stream, param
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize

hv.extension('plotly')

active_traces = Stream.define('traces', cols=[])

def _timeplot(data, cols=[]):
    traces = {k: hv.Curve((data.index, data[k]), 'time', 'amplitude') for k in cols}
    #traces = {k: hv.Curve(data, 'index', k) for k in cols}
    # if not traces:
    #     traces = {"dummy": hv.Curve([])}
    return hv.NdOverlay(traces, kdims='k')

def timeplot(streams):
    curve_dmap = hv.DynamicMap(_timeplot, streams=streams)
    # return spread(datashade(curve_dmap, aggregator=ds.by('k', ds.any())))
    # return spread(datashade(curve_dmap, aggregator=ds.by('k', ds.count()), width=1000, height=1000))
    return spread(datashade(curve_dmap, aggregator=ds.count_cat('k'), width=1000, height=1000))

n=100
m=1000
def stream(streamdict):
    """
    Stream simulation data into a Streamz DataFrame
    Takes an optional document to stream in `add_next_tick_callback` or
    a cell handle to invoke `push_notebook` on.
    """
    for i in range(n):
        res = {"stuff": pd.DataFrame({'foo': np.random.rand(m), 'bar': np.random.rand(m)+0.8, 'baz': np.random.rand(m)-0.8}, index=np.arange(m)+m*i)}
        for k, v in res.items():
            #print(list(v.columns), list(streamdict[k].data.columns))
            if k in streamdict and list(v.columns) == list(streamdict[k].data.columns):
                streamdict[k].send(v)
            else:
                buf = Buffer(v, length=int(1e9), index=False)
                streamdict[k] = buf
        yield

cols = active_traces(cols=['foo', 'bar', 'baz'])

Things to play with:

d = {}
it = stream(d)
next(it)
timeplot([d['stuff'], cols])
cols.event(cols=[])
d['stuff'].clear()

pepijndevos · April 1, 2022, 5:26pm

Another bug, maybe more Panel than Holoviews: when using responsive=True on a Holoviews plot, it seems to want to take up the entire space of of the container.

So let’s say you have pn.Row(foo, holoview) then holoview will take on the size of the whole row, and because of foo, become larger than the available space. Putting the holoview inside its own row or column seems to fix the problem.

jbednar · April 1, 2022, 8:30pm

It’s not clear to me which of these are bugs in your app, and which are bugs in the underlying libraries. It’s hard to tell when they are all bound together like this, but:

Bug 1: definitely something to reproduce on its own; unclear whether this is a bug or a difference in understanding.
Bug 2: Seems like a feature request; you seem to say that for your purposes it would be nice that an Overlay accept an empty set of plots. It’s a reasonable request, but not a bug.
Bug 3: Unless it’s narrowed down to a reproducible case it will be difficult to debug.
Bug 4: That sounds easy to make a minimal reproducer for; please do so and open a separate bug report.
Bug 5: Not sure what you mean by “different”.
Bug 6: Please do report separate, identified issues when you find them; things can be addressed but only if we know about them and they are in a form we can take action on.
Bug 7: (responsive=True) That’s one to file on Panel, again with a minimal reproducer.

The software is all provided for free, but a good way to contribute to it is to submit minimal, reproducible, actionable bug reports!

pepijndevos · April 4, 2022, 2:20pm

Bug 1: definitely something to reproduce on its own; unclear whether this is a bug or a difference in understanding.

if the behavior with or without datashader is different I assume it’s a bug,

Bug 3: Unless it’s narrowed down to a reproducible case it will be difficult to debug.

Yeah. It happens 100% of the times half the time. Question is, how to make a case for this? Like, literally the code I’ve posted here reproduces it if you evaluate the right things in the right order, it’s not very nice to paste a bunch of code snippets and tell the reader to paste them one by one into different notebook cells and then evaluate them in a specific order.

Bug 5: Not sure what you mean by “different”.

Say you have a dataframe with foo and bar columns. If you do hv.Curve(data, 'index', k) on several of those columns Datashader gives an exception that it can’t find domain ‘foo’. Works fine without datashader or if all columns are called amplitude.

Bug 6: Please do report separate, identified issues when you find them; things can be addressed but only if we know about them and they are in a form we can take action on.

Problem is, if the workaround is easy and making a reproducer hard…

pepijndevos · April 4, 2022, 3:24pm

Okay here are all the Holoviews issues for the above bugs with steps to reproduce them: Issues · holoviz/holoviews · GitHub

pepijndevos · April 4, 2022, 4:27pm

The Panel issue is much more devious than I thought.

I tried to make a simple reproducer, and it worked fine.
Then I stripped my app down completely, and that too worked just fine.
Then I started with my app on --autoreload and piece by piece started removing stuff, making sure at every step the problem still occurred.

And then I restarted the server and it worked fine again.

To be clear, on my app plot size does not work correctly on a fresh start of the server. But the reduced example that had extremely wrong plot sizes, suddenly started working after restarting the server.

I’m sorry, I am out of time right now to dive into an issue that requires stripping down my app bit by bit, restarting the server at every step, when the workaround is to just wrap the plot in an extra Row. Maybe another time… Or maybe if anyone has any clue what I should be looking for.

And uh… I guess
Bug 8: Panel autoreload does not fully reload the app somehow. No idea what is going on.

jbednar · April 4, 2022, 9:49pm

Thanks for those issues, though they could definitely be trimmed down more for us to be able to act on them. We’d have to do the same work as you would to try to strip things down bit by bit, and please remember that our work answering issues is on a volunteer basis, so any user who can do that work of pinpointing the problem is making a contribution for all users, letting us focus on actually fixing bugs.

For bug 8, Python itself is not all that great at hot reloading, because things that get defined get stuffed back into the same namespace. So it’s like out of order execution in Jupyter notebooks: often useful, but tricky to reason about and not to be considered reproducible.