Hello everyone,
I want to stream data using datashader along with bokeh. I know that holoviews would already do the job for me, however interactivity with holoviews does not work on a particular installation on a particular server (while it does on my computer). I decided to quickly write a bokeh plot that more or less does the same thing as holoviews. While this plot work fine, I have noticed something strange. When calling canvas.line
, the first call always takes about 1 second no matter how many point I plot, which is very slow. This means that if I want to plant multiple lines at the same time, the first draw will take n*1s
to execute. If I take bokeh out of the picture, this problem still persists, so I know it is only related to datashader.
I prepared a little example, where I plot n
random walks, which are updated at a regular interval. In this example, I time the calls of each individual canvas.line
.
import numpy as np
import pandas as pd
import datashader as ds
import datashader.transfer_functions as tf
import time
x_name = "x"
def shade(data, y):
cvs = ds.Canvas(plot_width=800, plot_height=400)
aggs = []
if not isinstance(data, pd.DataFrame):
data = pd.DataFrame(data).astype(float)
for i, yn in enumerate(y):
t = time.time()
aggs.append(cvs.line(data, x_name, yn))
t = time.time() - t
print(f"line: {i}, csv.line: {t:.07}s")
imgs = [tf.shade(aggs[i]) for i in range(len(y))]
return tf.stack(*imgs)
def init_data(num_lines):
data = {x_name: np.array([0])}
ys = []
for i in range(num_lines):
data[f"y{i}"] = np.array([0])
ys.append(f"y{i}")
return data, ys
def update_data(data, num_data, num_lines, refresh_rate):
last_time = data[x_name][-1]
new_time = np.linspace(last_time + 1e-3, last_time + refresh_rate / 1000, num_data)
x = list(np.append(data[x_name], new_time))
data[x_name] = x
# Create random walk
for i in range(num_lines):
key = f"y{i}"
last_y = data[key][-1]
rand = np.random.normal(0, np.sqrt(10 / num_data), size=num_data)
new_y = np.cumsum(rand) + last_y
y = list(np.append(data[key], new_y))
data[key] = y
return data, last_time
num_data = 1000
num_lines = 10
refresh_rate = 500
total_time = 30
data, y = init_data(num_lines)
for i in range(int(total_time / (refresh_rate / 1000))):
data, last_time = update_data(data, num_data, num_lines, refresh_rate)
t = time.time()
imgs = shade(data, y)
t = time.time() - t
print(f"Time: {last_time}s, Shade function timing: {t:.07}s\n")
time.sleep(refresh_rate / 1000)
Using Python 3.7.7
and datashader 0.11.0
, I obtain the following results:
line: 0, csv.line: 1.019807s
line: 1, csv.line: 0.6681943s
line: 2, csv.line: 0.6904411s
line: 3, csv.line: 0.6443765s
line: 4, csv.line: 0.6933248s
line: 5, csv.line: 0.7077668s
line: 6, csv.line: 0.732996s
line: 7, csv.line: 0.6384065s
line: 8, csv.line: 0.6951737s
line: 9, csv.line: 0.712997s
Time: 0s, Shade function timing: 7.337082s
line: 0, csv.line: 0.0017488s
line: 1, csv.line: 0.001763821s
line: 2, csv.line: 0.001781225s
line: 3, csv.line: 0.001730204s
line: 4, csv.line: 0.001755476s
line: 5, csv.line: 0.001938581s
line: 6, csv.line: 0.002449274s
line: 7, csv.line: 0.001697063s
line: 8, csv.line: 0.001592875s
line: 9, csv.line: 0.001425743s
Time: 0.5s, Shade function timing: 0.143779s
line: 0, csv.line: 0.005766869s
line: 1, csv.line: 0.008542538s
line: 2, csv.line: 0.00506115s
line: 3, csv.line: 0.01425767s
line: 4, csv.line: 0.004119158s
line: 5, csv.line: 0.00257349s
line: 6, csv.line: 0.002157927s
line: 7, csv.line: 0.002114534s
line: 8, csv.line: 0.00369072s
line: 9, csv.line: 0.00231266s
Time: 1.0s, Shade function timing: 0.2164218s
...
You can see here that the first calls to canvas.line
always take a lot of time, and the subsequent calls are okay. If I want to plot 32 lines, I would have to wait 32 seconds, which makes datashader unusable. Can you explain me what I did wrong, or if there is a better way to use datashader ?
Thank you for your answers.