Thank you very much for your kind offer!

The following is a more complete example of how my data looks:

```
import hvplot.pandas
import numpy as np
import pandas as pd
import xarray as xr
# data
ta = xr.DataArray(
np.random.random((4, 2, 3)),
dims=("time", "lat", "lon"),
coords={"time": np.arange(0, 4), "lat": np.arange(0, 2), "lon": np.arange(0, 3)},
)
tsurf = xr.DataArray(np.random.random((4, 2, 3)), dims=("time", "lat", "lon"))
ptype = xr.DataArray(
np.array([3, 2, np.nan, 1, np.nan, np.nan]).reshape((2, 3)), dims=("lat", "lon")
)
vtype = xr.DataArray(
np.array([np.nan, np.nan, 5, np.nan, 2, 7]).reshape((2, 3)), dims=("lat", "lon")
)
ds = xr.Dataset({"ta": ta, "tsurf": tsurf, "ptype": ptype, "vtype": vtype})
```

resulting in a Dataset

```
<xarray.Dataset>
Dimensions: (time: 4, lat: 2, lon: 3)
Coordinates:
* time (time) int64 0 1 2 3
* lat (lat) int64 0 1
* lon (lon) int64 0 1 2
Data variables:
ta (time, lat, lon) float64 0.7546 0.5793 0.1435 ... 0.5606 0.545
tsurf (time, lat, lon) float64 0.164 0.9778 0.8544 ... 0.4033 0.3633
ptype (lat, lon) float64 3.0 2.0 nan 1.0 nan nan
vtype (lat, lon) float64 nan nan 5.0 nan 2.0 7.0
```

The idea is that I have âmapsâ (`lat`

and `lon`

) of air temperatures `ta`

above certain surfaces with temperatures `tsurf`

for different `time`

s. For each surface (`lat`

and `lon`

), I know its pavement type (`ptype`

) and vegetation type (`vtype`

). Both, `ptype`

and `vtype`

, do not overlap, i.e. if one is not `nan`

, the other must be `nan`

. In my real data, all dimensions are (potentially much) larger and I also have more type-like arrays, i.e. more arrays that are independent of `time`

.

My aim is now to create a scatterplot of `ta`

vs `tsurf`

, by type and each `time`

. I do have a solution for that but it is based on a pandas dataframe:

```
df = ds.to_dataframe()
# generalization of types
df["type"] = df["ptype"].notna() * 1 + df["vtype"].notna() * 2
df["type"] = (
df["type"].astype("category").cat.rename_categories(["pavement", "vegetation"])
)
df.hvplot.scatter(x="tsurf", y="ta", by="type", groupby="time")
```

resulting in

The advantage of pandas is that I can use a categorical `dtype`

for `type`

which is not available in xarray (please correct me if I am wrong here). The disadvantage is, however, that in order to fill the table, all non-`time`

dependent quantities are replicated resulting in a unnecessarily large dataset:

```
ta tsurf ptype vtype type
time lat lon
0 0 0 0.754638 0.163989 3.0 NaN pavement
1 0.579287 0.977793 2.0 NaN pavement
2 0.143468 0.854425 NaN 5.0 vegetation
1 0 0.001652 0.898264 1.0 NaN pavement
1 0.746001 0.787206 NaN 2.0 vegetation
2 0.305514 0.316464 NaN 7.0 vegetation
1 0 0 0.327031 0.647004 3.0 NaN pavement
1 0.458336 0.968298 2.0 NaN pavement
2 0.892447 0.711597 NaN 5.0 vegetation
1 0 0.839581 0.674225 1.0 NaN pavement
1 0.914335 0.250300 NaN 2.0 vegetation
2 0.592095 0.444696 NaN 7.0 vegetation
2 0 0 0.993893 0.067323 3.0 NaN pavement
1 0.551221 0.554406 2.0 NaN pavement
2 0.645342 0.683610 NaN 5.0 vegetation
1 0 0.973098 0.933462 1.0 NaN pavement
1 0.704639 0.053339 NaN 2.0 vegetation
2 0.976332 0.161400 NaN 7.0 vegetation
3 0 0 0.530574 0.550358 3.0 NaN pavement
1 0.139996 0.707700 2.0 NaN pavement
2 0.964032 0.175724 NaN 5.0 vegetation
1 0 0.225253 0.981884 1.0 NaN pavement
1 0.560618 0.403304 NaN 2.0 vegetation
2 0.544982 0.363305 NaN 7.0 vegetation
```

Thus, my idea was to stick with the xarray object. Any idea how to do that? Or any other advice?