I have an use case to build a dashboard, which will ultimately create directed graphs based on a parquet file created with pySpark (using a single node).
In terms of large file, I mean that the parquet file can be up to 30 GB and could contain around 120 million data points.
I can play with the file quite fast using Jupyter Lab, so in terms of RAM memory, should be fine.
I am curious if I am on a wrong track trying to use panel, or if is doable.
On a second thought, I don’t think is possible, with a pySpark dataframe.
As far as I know, Panel supports Pandas dataframes and converting pySpark dataframe to Pandas is very expensive.
I think it should be doable. But it would depend on 1. How many users you have 2. What “servers” you have access to 3. The kind of dashboard 4. how you implement it.
My hypothesis would be that it could be beneficial to use Dask.
Panel just runs on top of Python. So in principle you can develop on top of any python framework, data format or server you can interact with. Including Spark if needed.