I ended up using DuckDB. Some benchmarks with my 24Gb parquet file (71Gb in memory. All of that will not fit into memory at once. Tested on my laptop):
- Dask: 5:00min initially, 1:30min with optimizations
- Polars: 40 seconds out of the box. 11 seconds with some fine tuning.
- DuckDB: 0.5-2 seconds out of the box (used same optimized table structure as with Polars)
I liked the Polars API perhaps a bit more than writing SQL for DuckDB, but decided to go with DuckDB as it was so fast out of the box.