Fast Remote Image Viewer

skytaker · May 13, 2022, 8:59pm

I’m trying to view data as images over a low speed network connection. The data are 2D numpy arrays of up to 1 million float elements. On start, I’m loading the image plots into a dictionary (and cache) and feeding the dictionary to a HoloMap. This appears to be the fastest option, but still a bit too laggy when scrolling though images over the slow connection.

The data come in batches and there is a time constraint. The up-front loading cost is tolerable, but it would be preferable if the images could begin displaying immediately while continuing to load.

Any suggestions for improving this design? Are there any client -side caching options? Serving static images or using compression? I have no experience with HTML or JavaScript.

Working synthetic example below. (I’m timing the data load but don’t know how to time the image render interval within HoloMap.)

import time
from multiprocessing import Pool

import holoviews as hv
import numpy as np
import panel as pn
from holoviews import opts
from holoviews.operation.datashader import regrid

hv.extension('bokeh')
pn.extension()

#  set global plot defaults
opts.defaults(
    opts.Image(
        height=400,
        width=800,
        cmap='gray',
        invert_yaxis=True,
        framewise=True,
        tools=['hover'],
        active_tools=['box_zoom'],
    ))


def read_data(m=2500, n=300):
    ''' simulated data i/o function '''
    data = np.random.rand(m, n)
    time.sleep(0.1)  # add some realistic delay
    return m, n, np.random.rand(m, n)


def create_image(path):
    ''' create image from data '''
    x0, x1, nparray = read_data()
    return path, hv.Image(nparray,
                          kdims=['x0', 'x1'],
                          bounds=[1, 1, x0, x1],
                          extents=(0, 1, x0 + 1, x1))


# synthetic data file names
files = [f'file_{i:03}.dat' for i in range(101)]

# preload & cache dictionary of filename:image pairs
start_counter = time.perf_counter()
if 'image_dict' in pn.state.cache:
    image_dict = pn.state.cache['image_dict']
else:
    with Pool(initializer=np.random.seed) as pool:
        result = pool.map(create_image, files)
    pn.state.cache['image_dict'] = image_dict = {r[0]: r[1] for r in result}
end_counter = time.perf_counter()
print(
    f"create_image: {len(image_dict)} images,  {str(round(end_counter - start_counter, 4))} seconds"
)

# add image dictionary to HoloMap with interpolation
plot = regrid(hv.HoloMap(image_dict, kdims='file'),
              upsample=True,
              interpolation='bilinear')

# simple layout
template = pn.template.BootstrapTemplate(title='Image Viewer')
template.main.append(plot)
template.servable()

Thanks.

maximlt · May 14, 2022, 8:53pm

First of all, thanks a lot for that minimum reproducible example, it’s extremely useful!

This appears to be the fastest option, but still a bit too laggy when scrolling though images over the slow connection.

Do you confirm that it’s the scrolling through the images that is slow, i.e. the loading time required after you select a file with the widget until the image is displayed?

Selecting an image is pretty snappy when I serve the app, however it is served locally so there’s no latency due to data being transferred over the network. Checking the websocket connection, I see that selecting an image transfers about 1.75 MB, that’s not much but could explain the latency you observe. Are you sure you need to upsample the images? Disabling this setting, the data transferred over websocket is about 0.75 MB per image, less than half the original.

skytaker · May 15, 2022, 3:53am

Confirmed. The scrolling between images is indeed very quick when served locally, or even over a high speed connection. Too slow, however, over the low speed (~2MBps) connection … not bad, but a bit too slow for my use case. Makes sense that the upsample is increasing the amount of data. I have tested without but the difference was negligible. How are you measuring the data transfer?

I appreciate your help. I realize this is more than sufficient for most applications.

maximlt · May 15, 2022, 11:35pm

Web browsers have a pretty powerful dev toolbox, that usually includes a Network tab where you can see the different kind of requests made by the page. Here you see a screenshot of the dev toolbox of Firefox, with which you can inspect a Websocket connection and the messages sent and received, including their time, content and size.

Depending on how slow is the internet connection you target, you could compute what would be an OK amount of data to transfer per image, and then see if there are ways to get to that size (doing more downsampling, casting the numpy arrays to integer, reducing the plot size, etc.). I don’t think there’s a way to sort of pre-transfer the data client-side to get less latency on file selection, but I may be wrong.

Also I’d suggest using threading instead of multiprocess to load the files, this is usually more suited to I/O operations like this. And in Panel the way to get something displayed while still loading data when a visitor reaches an app is to use pn.state.onload().

skytaker · May 16, 2022, 2:21pm

Thanks. I’ll do some further testing with this in mind.

skytaker · May 18, 2022, 5:53pm

@maximlt I’ve explored your suggestions and found the smallest transfer size comes from hv.operation.datashader.datashade. Adjusting the dtype of the input arrays appeared to make no difference on the transfer size. Is there a DynamicMap or HoloMap dtype option?

Method	Xfer Size
datashade(HoloMap(image_dict))	382 KB
rasterize(HoloMap(image_dict))	764 KB
regrid(HoloMap(image_dict))	764 KB
regrid(HoloMap(image_dict)upsample=True)	1.78 MB
shade(HoloMap(image_dict))	2.86 MB
spread(HoloMap(image_dict))	5.72 MB
dynspread(HoloMap(image_dict))	5.72 MB
HoloMap(image_dict)	5.72 MB

I can reduce the size (& quality) further by down sampling with scipy.ndimage.zoom using the slider.

Here is my latest attempt. Still a bit hacky but I’m using threads.


import time
from concurrent.futures import ThreadPoolExecutor

import holoviews as hv
import numpy as np
import panel as pn
from holoviews import opts
from holoviews.operation.datashader import datashade
from scipy.ndimage import zoom

hv.extension('bokeh')
pn.extension()

pn.config.throttled = True

# quality ~= image size ( 1 = 382 KB)
image_quality = pn.widgets.DiscreteSlider(
    name='Image Quality %',
    options=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    value=100,
    direction='rtl')


def make_data(m=2500, n=300):
    ''' synthetic data i/o function '''
    data = np.random.rand(m, n).astype(np.float32)
    data = zoom(data, image_quality.value * .01)
    return m, n, data


def create_image(path):
    ''' create image from data '''
    x0, x1, data = make_data()
    return path, hv.Image(data,
                          kdims=['x0', 'x1'],
                          bounds=[1, 1, x0, x1],
                          extents=(0, 1, x0 + 1, x1))


# # preload & cache dictionary of filename:image pairs
@pn.depends(image_quality.param.value)
def load_data(value, file_cnt=10):
    print('load_data')
    files = [f'file_{i:03}.dat' for i in range(file_cnt)]
    start_counter = time.perf_counter()
    with ThreadPoolExecutor(initializer=np.random.seed) as pool:
        result = pool.map(create_image, files)
    pn.state.cache['image_dict'] = {r[0]: r[1] for r in result}
    end_counter = time.perf_counter()
    print(
        f"preload image_dict: {len(pn.state.cache['image_dict'])} images,  {str(round(end_counter - start_counter, 4))} seconds"
    )

    return datashade(
        hv.HoloMap(pn.state.cache['image_dict'], kdims='file'),
        cmap='gray',
        cnorm='linear',
        # interpolation='default',
        dynamic=True,
        # streams=[],
        # precompute=False,
        # target=?,
        # element_type=?,
    ).opts(
        height=400,
        width=800,
        invert_yaxis=True,
        framewise=True,
        tools=['hover'],
        active_tools=['box_zoom'],
    )

# simple layout
template = pn.template.BootstrapTemplate(title='Image Viewer')
template.main.append(image_quality)
# template.main.append(plot)
template.main.append(load_data)
template.servable()

Please let me know if you have any further comments on improving or cleaning up the code. I’m definitely going to rework the caching design.
Thanks.

maximlt · May 19, 2022, 3:20pm

Thanks for reporting these values, I’m sure this is going to be helpful to lots of people!

Just to try to give you more insights on the operations you’ve applied:

spread and dynspread don’t change the size of the data, they will just blur the image, so the data shape and type stay the same; the transfer size is the same as the original one
shade does server-side colormapping and returns an RGB element. The data of an RGB element is stored as a 3D array of uint8 (1 byte per element, 4 layers) while the data of your image consists of float64 (8 bytes per element, 1 layer), which explains why the shade image is exactly half the size of the original one.
regrid by default downsamples (you end up with an array whose elements represent a pixel on your screen), which is why it’s smaller in your table. As expected the size gets bigger when you upscale.
rasterize is actually the same as regrid in this case, because the input is an Image element so there’s no need for it to apply any other aggregation than regridding
datashade is an operation that simply wraps rasterize+shade, which in this case means regrid+shade. You got the best result since both regrid and shade reduce the image size.

And some comments:

The shade operation defaults to setting cnorm='eq_hist' (a technique coming from datashader), you are more likely interested in a linear colormap which you can get by passing cnorm='linear' to the shade operation.
There’s a bug in HoloViews that prevents the regrid operation to consider the width/height parameters with the Bokeh backend (see this issue). When it is fixed you’ll be able to even more downsample your image and as such reduce the transfer size, if you need more reduction.
I’ve tried again casting the array to types such as int16 and I see a reduction in transfer size. If that makes sense with your data, you may want to try that again,

skytaker · May 19, 2022, 6:02pm

Excellent summary! Very helpful.

And YES! it would help to cast the array to a smaller type such as int16 but I don’t know how. I’ve tried recasting the input array to hv.Image with no luck. And I ran into this datashader limitation regarding float16.

maximlt · May 19, 2022, 8:40pm

Thanks for pointing to this limitation in datashader, I wasn’t aware of it.

In your example I’d do something like data = (np.random.rand(m, n) * 100).astype(np.int8), would not transform data with scipy.ndimage.zoom (I don’t know if it preserves the type, maybe!), and would start by not applying any operation just to see if this has an effect.

Also note that casting and using shade or datashade will have no effect on the payload size, since the image is ultimately turned into an RGB.

skytaker · May 19, 2022, 8:54pm

I’ve tried the techniques below with no improvement. Still 382 KB transferred. I’ll try peeling off the operations again.

# attempt 1: cast input to uint8
def create_image(path):
    ''' create image from data '''
    x0, x1, data = make_data()
    data = (255 * (data - np.min(data)) / np.ptp(data)).astype('uint8')
    return path, hv.Image(data,
                   kdims=['x0', 'x1'],
                   bounds=[1, 1, x0, x1],
                   extents=(0, 1, x0 + 1, x1))

# attempt 2: cast hv.Image data attribute to uint8
def create_image(path):
    ''' create image from data '''
    x0, x1, data = make_data()
    img = hv.Image(data,
                   kdims=['x0', 'x1'],
                   bounds=[1, 1, x0, x1],
                   extents=(0, 1, x0 + 1, x1))
    img.data = (255 * (img.data- np.min(img.data)) / np.ptp(img.data)).astype('uint8')
    return path, img

Thanks for your efforts.

skytaker · May 19, 2022, 10:13pm

Good news! Casting to uint8 makes no difference when using hv.operation.datashader.datashade. However, the transfer size does scale with the data when using hv.operation.datashader.regrid.

Later, I’ll post an updated comparison table for reference.

maximlt · May 20, 2022, 5:58am

Yes it’s what I meant with:

Also note that casting and using shade or datashade will have no effect on the payload size, since the image is ultimately turned into an RGB.

Sorry if I wasn’t clear enough!