How to upload some data to support a question?

I want to post a Holoviews/Datashader question, but the first thing folks will ask for is a MRE. I made the MRE, but I don’t see an easy way to upload my small set of data to go with it. My data is ideally a nested dictionary containing time/data Series. I wrote that to JSON (size 277 kB), then realized I can’t upload that file type to this site.

Is it bad form to provide data for an MRE on this forum? My issue doesn’t occur when I generate generic data. I assume I could trick the uploader by choosing an acceptable file extension for my JSON file (like “.csv”), but it’s probably restricted for a reason.

1 Like

Hi @CrashLandonB,

To me it reads like there is an issue with the processing of the file if generic data functions, but maybe that’s a good place to start with the generic data showing how it should work. If a few lines of the jason works and reveals the issue maybe just pasting some data into the post iteslf will be enough.

Thanks, Carl.

1 Like

Thanks @carl. The file itself shouldn’t be the issue. The only reason I made the file was to support the MRE. My full code produces this nested dictionary from “real” data sources and feeds that object to the plotting method in my MRE.

I can create the generic data set in the MRE, but it will work fine, and the next question will be “how is the “real” data different from the “generic” data”, so how can I upload that.

If it’s a more complex example, just post it to GitHub or Google Colab.

So I read the issue like this, the MRE doesn’t work with the ‘real’ datasets but when you recreate the data sets it works. Can’t say with certainty but the issue seems to lie with reading in the wanted data from desired sources as you say the code can funtion if you create the data in a certain way. So to me it’s getting the real data to mimic the working generic data or understanding the differences between the two. If the MRE works with a really small set of data you could just paste some of your real data data from the file directly into the forum here and say put it into a file like this or this is what I do and then can see from there maybe?

Thanks, Carl.

Maybe the GitHub approach is the best option. I just a new-enough user that I was hoping someone here could tell me “you’re doing it wrong”.

But my question still remains… is there a particular reason we don’t want to load small datasets (JSON, pickle, etc.) to this forum? I see CSV is an option, but that doesn’t help me show a more-complex data structure that is giving me trouble…

@carl, agree that it really is a question of why doesn’t Holoviews/Datashader like one dataset vs another. In this case I’ve investigated the differences as best I can, but can’t find an obvious issue.

I’ll probably proceed with the GitHub submission, assuming I can upload the small JSON file with my MRE. Still would like to know if this forum prohibits/avoids that upload for a particular reason, or maybe it would be helpful to allow it.

Hi @CrashLandonB,

Sure, I’m not sure with regards to file type upload support. The forum engine maybe just doesn’t support it for whatever reason.

But why not not just open the file, copy, paste contents here if it doesn’t upload (or small portion), can then throw into a file this end seems like a workaround. Either way here, github if can help will try.

Cheers, Carl.

FYI, I’ve submitted my issue on GitHub here:

… and it turns out you can’t load a .json file there either. I just changed it to .txt and got on with it.

I meant you should post your code (with the json file) to a repo on GitHub. Or Colab.

1 Like

Hi @CrashLandonB

Good questions. Some of the answers have been given above. Some of the approaches I’ve tried

  • rename your .json file to .csv and upload it
  • upload your .json file to some external storage like a github gist or repository.
  • create a github issue instead with the file
  • request .json upload supported on our community forum via a github feature request.

Hi @CrashLandonB,

Edit: Can’t be certain but actually now I’ve found the post I think you may be experiecing similar issue reported as bug rather than what I’ve said below

My thoughts here are there is either too much data or there is something going on with the data that isn’t well liked maybe. The reason for my thoughts here is I reduced the data set right down to one param / test and zooming around is fine with a shortened real data snippet. To find where it goes wrong in that file I would just keep increasing by one param / test until it breaks, skip that particular param / test and continue adding others see if works still that way you should be able to deduce if quantity or issue in the data set causing the problem.

Also if I remember right, someone else recently posted an app where it demoed significant slowdown with number of graphs & curves on each graph. Again if I recal there was significant slowdown from about 4/5 grpahs with multiple curves you could be experiencing similar here. So far your app is working for me with 7 curves over three graphs… actually now I’m working up to 5 params, little bit of slow down but for me acceptable.

Hope of some help, thanks Carl.

@carl Thanks for input. Seems to me the issue/bug 5546 on GitHub could be relevant, although the behavior he’s showing in the videos is different.

Regarding the idea of “too many data points”, I agree that shouldn’t be the case. My entire code works great with if you use the “generic” data option. Much more data than the “real data” option.

I have also tried the “only plot a subset and repeat with more until it breaks” approach. The easy method for this is to slice the following:

for top_key, curves in list(curves_dict.items())]

becomes something like:

for top_key, curves in list(curves_dict.items())[:8]]

Interestingly, playing around with my MRE, it seems like it’s handling more real data set today than my prior troubleshooting. Today slicing just the first 8 parameters was the lowest number to show performance issues. Slicing to 9 (all of the parameters) makes it grind to a halt. Last week plotting only 5 or 6 made it grind to a halt.

Also note that removing the datashader function, plotting the entire real dataset works fine, where instead of:

        self.overlay = [datashade(hv.NdOverlay(curves), line_width=line_width, pixel_ratio=pixel_ratio,
                                   cnorm=cnorm, aggregator=ds.count()
                                   ).opts(width=plot_width, tools=['hover']).relabel(top_key)
                        for top_key, curves in list(curves_dict.items())[:9]]


        self.overlay = [(hv.NdOverlay(curves)).opts(width=plot_width, tools=['hover']).relabel(top_key)
                        for top_key, curves in list(curves_dict.items())[:9]]

… and that is fine for small datasets, but i’m looking for a solution that can plot millions of data points, and so the datashader operation will be necessary. Fortunately, we can recreate the issue with the small “real” dataset.

One theory of mine is that hv/ds don’t like either:

  • The time-series index of these “real” signals are ascynchronous between eachother, while the “generic” are synchronous.
  • Or the fact that some of these data sets have a large time gap in the middle of them. Perhaps hv/ds have to work extra-hard to deal with the space between?

looks like @Hoxbro found the fix! Workaround included below: