Is there an example to host datashader rasters as a tile server

ahuang11 · September 15, 2021, 9:28pm

Or create export all the tiles?

jbednar · September 17, 2021, 6:39pm

Yes; see https://github.com/holoviz/datashader/blob/master/examples/tiling.ipynb . I don’t think this example is linked or tested anywhere, but if you start with that and verify that it works (cleaning it up if necessary), it would be great to promote it more widely as a public example for others to follow.

ahuang11 · September 17, 2021, 6:47pm

Yes, I also found that.

Some issues I encountered with tiles.py:

At zoom > 20, this takes a very long time.

        # TODO: vectorize?
        tiles = []
        for ty in range(tymin, tymax + 1):
            for tx in range(txmin, txmax + 1):
                if self.is_valid_tile(tx, ty, level):
                    t = (tx, ty, level, self.get_tile_meters(tx, ty, level))
                    tiles.append(t)

This doesn’t work with lazily loaded datasets:

            b = db.from_sequence(stats)
            span = dask.compute(b.min(), b.max())

Span should be exposed as a keyword under render_tiles and if it’s passed, don’t need to call calculate_zoom_level_stats
For huge datasets > 500 GBs and faraway zoom, _get_super_tile_min_max takes forever/crashes because it tries to load in the whole dataset to rasterize; to get around it since faraway zoom can’t see details anyways, I subset every 16 * 256 / 16 * 512 / etc values. However, by skipping values, there may be a chance that a local maxima may be discarded, so maybe this should be an optional keyword. The flipside is that once all the zoomed tiles are generated, users can easily just scroll and see the maxima.

def _get_super_tile_min_max(tile_info, load_data_func, rasterize_func):
    ...
    if tile_info["level"] < 9:
            ixs = np.unique(
                np.linspace(0, len(df["x"]) - 1, tile_size * 16).astype(int))
            iys = np.unique(
                np.linspace(0, len(df["y"]) - 1, tile_size * 16).astype(int))
            df = df.isel(x=ixs, y=iys)

Lastly, the docs uses bokeh to show the tiles, but can be achieved more directly with

import geoviews as gv
gv.extesnion()
gv.WMTS("localhost:8080/{Z}/{Y}/{X}")

And it should mention cvs.points can be replaced with cvs.raster for xr.Datasets.

It would also be nice to show how to project a dataset if not in EPSG:3857 (google mercator) since it only supports 3857. (Example - Reproject — rioxarray 0.7.0 documentation)

jbednar · September 17, 2021, 7:04pm

Yes, that’s a lot of data to be generating. Would be nice to have good feedback with progress bars, etc. Plus we’d love a lazy tile server that only renders what a user actually accesses, then caches that.
Yes, if you already know the span, there needs to be a way to pass that in and avoid such a painful computation. But what’s the specific problem with dask.compute(b.min(), b.max()) for a lazy dataset? Won’t it page in the dataset chunk by chunk and compute the min and max? I believe that was the intention (though this is not my code).
Yes.
Yes, decimation is tricky but often important; should be configurable.
Yes, the author of that code is a Bokeh rather than HoloViz user mainly, so he used Bokeh, but I agree it’s simpler with GeoViews.
Mentioning that cvs.points can be replaced with whatever glyph is appropriate is a good idea.
Mentioning projections is also a great idea.

I’d love to see a PR with those changes! I’ve never found the time to mess with tile generation myself, but I think it’s an important feature and one that should be better supported and better documented!

ahuang11 · September 17, 2021, 7:57pm

Maybe it’s a nested dask computation; basically it crashes because somewhere it goes to numba.