GSoC 2026: Interest in Lumen + Xarray Integration – Harshith

hey everyone i’m Harshith, i am currently pursuing my Undergrad in Computer science specializing in Data Science and i’m looking to apply for GSoC 2026 with HoloViz, specifically the Lumen + Xarray integration project.

so a bit about my background
i’ve been contributing to a few Python ecosystem’s mainly Dask and NumPy. in Dask i’ve submitted PRs around distributed computation, partition metadata, serialization and docs and in NumPy i’ve worked on dtype handling, CPU feature detection and backend compatibility. i also recently opened a PR in PyMC where i implemented a new measurable class and graph rewrite logic which gave me some insights to backend graph-based computation.

since i’ve already worked a lot with Dask’s chunked and lazy computation model i’m really interested in this project and from what i understand Lumen mostly works with tabular data via DuckDB right now,
so the idea of integrating xarray’s labeled N-dimensional data model into Lumen’s data layer while keeping Dask compatibility is really interesting to me.

basically i wanted to explore how an XarraySource could work with Dask-backed arrays, respect xarray’s labeled dimensions and coordinates, and handle filtering and aggregation without breaking semantics.

would really appreciate any guidance on where to start exploring Lumen’s Source abstractions. thanks!