`node_color` parameter does not work with Sankey

LinuxIsCool · September 24, 2024, 9:17pm

Consider the following example:

import holoviews as hv
import pandas as pd

# Initialize Holoviews with the Bokeh backend
hv.extension('bokeh')

# Define nodes with unique string identifiers
nodes = pd.DataFrame({'index': ['A', 'B', 'C', 'D']})

# Define links with source and target as node names (strings)
links = pd.DataFrame([
    {'source': 'A', 'target': 'C', 'value': 5},
    {'source': 'B', 'target': 'C', 'value': 3},
    {'source': 'C', 'target': 'D', 'value': 8}
])

sankey = hv.Sankey((links, nodes)).opts(
    hv.opts.Sankey(
        node_color='black',   
        node_line_color='red',  
        edge_color='purple',
    )
)

# Display the Sankey diagram
sankey

I would expect the nodes to be black!

LinuxIsCool · September 24, 2024, 9:41pm

I see there is an open issue here: Sankey node_color and node_fill_color options not applied · Issue #3835 · holoviz/holoviews · GitHub

LinuxIsCool · October 4, 2024, 4:33pm

Thanks @philippjfr for linking back here from the github issue.

I hope this issue can be resolved in the future, I think Sankey Diagrams are so cool but they really need the ability to set node colors!

In the meantime, I have found a work-around by setting node_line_width to be a large number like 20, since node_line_color is working as expected, this has the effect of coloring the node via making the edges super thick and coloring them.

I’m using the following to display a Sankey diagram flowing over time:


import hvplot.pandas
import holoviews as hv
import pandas as pd
import numpy as np
import panel as pn

hv.extension('bokeh')
pn.extension()

# Parameters
num_time_steps = 200

# Data setup
nodes = [
'A', 'B', 'C', 'D', 'E',
]

flows = [
    {'time': 1, 'source': 'A', 'target': 'B', 'value': 1e6},
    {'time': 1, 'source': 'A', 'target': 'C', 'value': 9e6},
    {'time': 2, 'source': 'B', 'target': 'D', 'value': 1e6},
    # {'time': 2, 'source': 'C', 'target': 'E', 'value': 1e6},
]

df = pd.DataFrame(flows)

# Add cumulative value
df['cumulative_value'] = df.groupby(['source', 'target'])['value'].cumsum()

# Extract unique nodes from source and target (sorted)
nodes_sorted = pd.unique(df[['source', 'target']].values.ravel('K'))
nodes_sorted = np.sort(nodes_sorted)  # Sort alphabetically

# Define the color mapping for the sorted nodes
from bokeh.palettes import Category20
palette = Category20[len(nodes_sorted)] if len(nodes_sorted) <= 20 else hv.Cycle('Category20').values
color_mapping = dict(zip(nodes_sorted, palette))
# Function to create Sankey diagram for a given time step
def create_sankey(t):
    subset = df[df['time'] <= t]
    sankey = hv.Sankey(subset, kdims=['source', 'target'], vdims='cumulative_value').opts(
        title=f'Cumulative Flows at Time {t}',
        label_position='right',
        width=1000,
        height=600,
        node_line_color=hv.dim('index').categorize(color_mapping, default='grey'),
        node_line_width=20,
        edge_color=hv.dim('source').categorize(color_mapping, default='grey'),
    )
    return sankey

# Create a player widget using Panel
time_steps = df['time'].unique()
player = pn.widgets.Player(
    name='Time', 
    start=time_steps.min(), 
    end=time_steps.max(), 
    step=1, 
    value=time_steps.min(), 
    width=800,
    interval=1000,
)

# Bind the function to the player widget
sankey_pane = pn.bind(create_sankey, t=player)

# Create a Panel layout
layout = pn.Column(sankey_pane, player)

# Display the layout
layout