Datashade shift color when no overlap between points

Hoxbro · May 6, 2021, 6:50pm

I have the following code, which works fine when I have overlap between the points, but when there is no overlap, it changes the color of the points to the other end of the colormap. Is this a bug? Or is there an argument I’m missing to disable this change?

import numpy as np
import holoviews as hv
from holoviews.operation.datashader import datashade, spread
hv.extension("bokeh")

np.random.seed(1234)
x = np.random.randn(100000, 2)

plot = spread(datashade(hv.Points(x), cmap="viridis"), px=2)
plot.opts(xlim=(-5, 5), ylim=(-5, 5))

after

dr_clank · May 6, 2021, 7:17pm

Using rasterize() instead of datashade() allows us to add a colorbar to the plot to better see what’s happening.

import numpy as np
import holoviews as hv
from holoviews.operation.datashader import spread, rasterize
hv.extension("bokeh")

np.random.seed(1234)
x = np.random.randn(100000, 2)

plot = spread(rasterize(hv.Points(x)), px=2)
plot.opts(
    hv.opts.Image(
        xlim=(-5, 5), 
        ylim=(-5, 5), 
        cmap="viridis", 
        colorbar=True
    )
)

The datashader operations recalculate the image based on the data in the current plot extent (unless you set dynamic=False). So I believe what you’re seeing is the aggregated pixel values == 1 when the data no longer overlaps. I can’t reproduce the behaviour in your example where the colour flips all the way to the maximum value of the cmap. Mine just goes to 1 when there is no overlap.

I’m at holoviews=1.14.3 but perhaps you’ll have different behaviour at another version.

Hoxbro · May 6, 2021, 7:57pm

Thank you for the explanation, makes it possible for me to make the plot I want.

Though, I’m also on holoviews=1.14.3, so it is weird that you can´t reproduce my example. What version of datashader are you using? I’m on datashader=0.12.1.

dr_clank · May 6, 2021, 8:53pm

I did the plot above with datashader 0.12.0. I just upgraded to 0.12.1 and my results are the same.

Hoxbro · May 7, 2021, 6:41am

I was referring to the spread/datashade combination, the spread/rasterize works perfectly. As I understood, you referred to the spread/datashade combination here:

jbednar · May 7, 2021, 12:54pm

What color were you hoping the datapoints would be when you zoomed in? Here from what I can see the problem is that you are using an inappropriate colormap for this background color. A colormap should have the highest value plotted in the color most different from the background, so that the salience of the data point relative to the page background indicates the value you are plotting. Here you’ve chosen a colormap with the opposite property, so that the highest-value points have a color most similar to the background. That’s why you are getting entirely unintuitive results like this. You can get away with such an inappropriate map when you are zoomed out so that the peak regions are surrounded by other lower values so that they have contrast, but that’s not a safe approach in general, as you will see when you zoom in or when there are isolated high-value datapoints due to the data itself. See e.g. Changing default HoloViews colormap · Issue #3500 · holoviz/holoviews · GitHub for extensive discussion and demonstrations. So here if you want this colormap you should be using it in the reversed version, so that it starts at yellow for the smallest values, then becomes darker (less similar to white) as the value increases. If you want the smaller values to show up better at that point, you can crop the colormap (viridis_r[50:]) to whatever value that makes them show up clearly while preserving enough dynamic range to distinguish them from higher values. But here as far as I can see Datashader is working as designed, just for an incorrect colormap.

dr_clank · May 7, 2021, 1:42pm

Yes, @Hoxbro you are correct; I was referring to spread/rasterizebut I think the point is the same: when you zoom in the relative density of the data changes, datashader recalculates, and the colors on the plot change to reflect the new datashader aggregation. That’s what I was attempting to demonstrate by adding the colorbar (which works with rasterize but not datashade).

Is your expectation that when you zoom to the point where all colored pixels have the same value datashader will pick the color from the bottom of the colormap instead of from the top or middle?

Hoxbro · May 7, 2021, 2:23pm

@jbednar I’m just starting to play around with datashader. The point was that there is a big difference in the plot when you have one overlapping pair of values and when you have none regardless of the colormap. The colormap I chose was just what I was using in my original code.

What I would expect to happen is this:
expected_zoom

jbednar · May 7, 2021, 2:29pm

There’s an open issue on Datashader for coming up with a better algorithm when there is a small number of unique values shown: Improve colors for small numbers of discrete values · Issue #357 · holoviz/datashader · GitHub But the behavior when there is only one value shown is correct: That value should be using the maximum extreme of the colormap, which is yellow for your colormap, hence why that colormap is incorrect for this page background color. We’d love to have a contribution of a better behavior when there are between 2 and 10 discrete colors shown, but there would not be any change in the behavior when there is a single color shown.

Are you asking how to disable dynamic color ranging altogether? If so just set a fixed clim. But if you do have auto-ranging, there is no reason it would dynamically default to choosing the lowest value in the colormap as you have shown in that example; that’s just incorrect behavior for auto-ranging! I hope you’ll be able to appreciate that if you read the two issue links I’ve included here.

Hoxbro · May 9, 2021, 2:30pm

I can’t entirely agree with going to the lowest value is incorrect behavior for one value shown.

My start assumption was that datashade would only do something when there is an overlap between points and therefore do nothing if there is no overlap between points.
With that assumption, it makes sense, at least to me, to always have the no-overlap being fixed to the minimum value of the colormap as a baseline when zooming around.

With that being said, I can completely understand why it is chosen not to be this way! But I firmly believe that it is a design decision and not a fact of life. What datashade is doing is fitting a spectrum of colors spanning from 1 to 1, so either all colors are correct, or none of the colors are.

Again, I can’t stress this enough I have no problem with how datashade works. I just had a wrong start assumption.

jbednar · May 11, 2021, 9:24pm

Going to the lowest value is incorrect if the goal is to make good use of the available dynamic range available to human observers, which is the goal around which auto-ranging algorithms are designed. If you have a white background and use a colormap primarily driven by intensity, as in your example, one end of that colormap will be similar to the background, and one end of it will be most dissimilar. Humans will naturally perceive value as contrast with the background color, and thus the highest plotted value should correspond to the color that is most dissimilar to the background. As long as you have correctly oriented your colormap so that it has contrast with your background, as you should always do (never inverting it so that the lowest-valued points are the most visible!), then when there is only a single color to auto-range and plot, it should be the highest end of the colormap (most dissimilar from the background), not lowest (most similar to the background). With only one value, the plot is then maximally using the available dynamic range to convey plot values, which is what the explicit goal of auto-ranging is.

If you instead want to convey absolute position on a fixed scale, you can certainly do that, by supplying such a fixed scale, in which case there is no auto-ranging. But if you are using auto-ranging, I argue that using the least discernable contrast in any case would be an error, not just an arbitrary choice.

Here my main argument is that the unintuitive behavior you are seeing is not to do with Datashader’s auto-ranging, but to the fact that the auto-ranging is revealing the incorrect choice of colormap. E.g. think about auto-ranging in other contexts. If your camera were auto-ranging trying to show a full-contrast image, and only two distinct light levels were found in the image, wouldn’t you expect one to go to the minimum value available and one to the maximum? That’s all that’s happening here.