How can I get Datashader force directed layout to eat more cores

Hi I am using datashader.layout.forceatlas2_layout on a network with 50000 nodes and 200000 edges and it takes about 12 mins on my macbook m1. I noticed that is is using 100% CPU but that only accounts for a single core. I am curious if there is a way to configure Datashader or some of the underlying dependancies to use more cores to speed things up a little.

Here is some sample code that generates a random network and then runs the forceatlas2 layout:

import networkx as nx
import pandas as pd
import time

from datashader.layout import forceatlas2_layout

# Make a random graph with 50000 nodes and 200000 edges and convert to a dataframe
G = nx.gnm_random_graph(50000, 200000)
nodes = pd.DataFrame({'id': list(G.nodes)}).set_index('id')
edges = pd.DataFrame({'source': [x[0] for x in list(G.edges)], 'target': [x[1] for x in list(G.edges)]})
edges.reset_index(drop=False, inplace=True)
edges.rename(columns={'index': 'id'}, inplace=True)
edges = edges.set_index('id')

# Run the forceatlas2 layout
start_time = time.perf_counter()
forcedirected = forceatlas2_layout(nodes, edges)
end_time = time.perf_counter()
print(f"Time taken: {end_time - start_time} seconds")

Thanks!

I’m seeing a similar performance bottleneck, and tracked it down to here.

Looks like we should jit this method. @dmiracle do you mind making an feature request for adding numba to the cooling method, and link to this discussion?

sure thing!

Thanks Dylan! Just for others than need to reference the code on GitHub