How to refer to data uploaded with FileInput for further processing?

ygmeow · December 24, 2022, 8:17pm

Hi, I am very new to Panel (and using widgets in general). I have the code below to read an excel file and convert it to a dataframe. I understand that the processed dataframe is stored in active_load_excel() if the file upload is success. So if I want to process this dataframe further in a separate cell in juypter, it works. However, if I want to do the followup process (e.g., to print out this dataframe) in the same jupyter cell as my file_input widget, it will always return as None because it runs simultaneously as the file_input and the file_input is always empty when I launch the program.

In my example, is there a way to tell the print statement to ‘wait’ until the file input is completed? And in a more general sense, how to refer to the data uploaded with FileInput for future processing? I would really appreciate it if anyone could provide the syntax for this.

file_input = pn.widgets.FileInput(accept=".xlsx")

def load_excel(data):
if data is not None:
df = pd.read_excel(io.BytesIO(data)
return df

active_load_excel = pn.bind(load_excel, file_input.param.value)
pn.Column(file_input, active_load_excel)

#the following print state will return None if run in the same cell as file_input
#but will return the uploaded excel dataframe if run in the next cell

print(active_load_excel())

riziles · December 27, 2022, 6:25pm

There are a lot of ways to do it (I’d recommend using param to create a parameterized object), but one easy solution is to just make df a global variable:

file_input = pn.widgets.FileInput(accept=".xlsx")

def load_excel(data):
    if data is not None:
        global df
        df = pd.read_excel(io.BytesIO(data))
        return df

active_load_excel = pn.bind(load_excel, file_input.param.value)

print(active_load_excel())

pn.Column(file_input, active_load_excel)

Here is a (sloppy) example with param:

import panel as pn
import param
import pandas as pd
import io
pn.extension()

class Data(param.Parameterized):
    df = param.DataFrame()
    
    def loader(self):
        file_input = pn.widgets.FileInput(accept=".xlsx")
        
        @pn.depends(file_input, watch = True)
        def load_excel(finput=file_input, self=self):
            data = file_input.value
            if data is not None:
                self.df = pd.read_excel(io.BytesIO(data))
                print (self.df)
                
        return file_input

data = Data()

pn.Column(data.loader, pn.widgets.DataFrame.from_param(data.param['df'], height = 500))

ygmeow · December 29, 2022, 12:32am

Thank you so much for providing the sample code and introducing me to param. Both of the code worked! I did notice that when I use param (your second code), the browse button for FileInput always displays “no file selected” within a few seconds after the file update (the actual file name will appear for a very brief time). I was wondering if this has anything to do with the ‘watch = True’, and if there’s a way to keep the file name.

riziles · December 29, 2022, 2:26am

Sure! That’s as simple as adding () after the reference to data.loader, so the last line would look like this:

pn.Column(data.loader(), pn.widgets.DataFrame.from_param(data.param['df'], height = 500))

I’m no expert, but my understanding is that “calling” the method (i.e. adding the ()) makes it more of a static view, whereas leaving it uncalled makes it more dynamic. Don’t quote me on this (literally).

carl · December 31, 2022, 11:02am

Hi @ygmeow,

Don’t quote me but I think you may have stumbled into an old issue perhaps bug but I haven’t been able to find anything on it other than recall reading somewhere. I’ve put an example below that demonstrates the behaviour, as you can see from the video if you run your example you get no output, then change to serve it works in tab, come back and change cell to original example it all starts to work. I don’t normally encounter this because I utilise different cells when working with jupyter lab so don’t see it normally. To sum up looks like an issue if just using one cell in jupyter, fine if serving app.

import io
import panel as pn
import pandas as pd
import openpyxl
import param

pn.extension()

xlsx_file_input = pn.widgets.FileInput(accept=".xlsx")

def load_excel_file(file_bytes):
    if file_bytes is not None:
        return pd.read_excel(io.BytesIO(file_bytes))
    
get_dataframe = pn.bind(load_excel_file, xlsx_file_input)
dash = pn.Column(xlsx_file_input, get_dataframe)

dash

Also note you don’t need to use a global with bind, it returns the value, so you should be able to get the dataframe by typing

df = get_dataframe()
df.head()
#process dataframe further

ygmeow · December 31, 2022, 6:22pm

Hi Carl, thanks for pointing out this behavior (nice video example BTW). I might have noticed that too but it was not always reproducible on my end. I thought it was some glitch with my jupyter notebook

I figured out a workaround with suggestion from @riziles and added a button that triggers all the functions that do the downstream processing.

Happy New Year!

randomBloke · August 22, 2023, 3:01pm

But this only works if the line df = get_dataframe() is in a separate Jupyter notebook cell. How would I do this in a single cell?

carl · August 22, 2023, 3:50pm

Hi @randomBloke,

If you use @riziles way above you can get it all to work in one cell

import panel as pn
import param
import pandas as pd
import io
pn.extension()

class Data(param.Parameterized):
    df = param.DataFrame()
    
    def loader(self):
        file_input = pn.widgets.FileInput(accept=".xlsx")
        
        @pn.depends(file_input, watch = True)
        def load_excel(finput=file_input, self=self):
            data = file_input.value
            if data is not None:
                self.df = pd.read_excel(io.BytesIO(data))
                print (self.df)
                
        return file_input

data = Data()

pn.Column(data.loader, pn.widgets.DataFrame.from_param(data.param['df'], height = 500))

randomBloke · August 23, 2023, 7:13am

This code gives an error:

DeserializationError: can't resolve reference 'cf7038bf-1820-4450-92f0-f9be659cd26f'

The second issue is that data is not of type dataframe, but has type __main__.Data

randomBloke · August 23, 2023, 7:37am

This code is very confusing. The argument finput is never used and the function load_excel is never called. (How) does this work?

carl · August 23, 2023, 8:19am

carl:

       
        @pn.depends(file_input, watch = True)
        def load_excel(finput=file_input, self=self):
            data = file_input.value
            if data is not None:
                self.df = pd.read_excel(io.BytesIO(data))
                print (self.df)
                
        return file_input

data = Data()

pn.Column(data.loader, pn.widgets.DataFrame.from_param(data.param['df'], height = 500))

I just noticed that after you mentioned, I guess you can change to this

import panel as pn
import param
import pandas as pd
import io
pn.extension()

class Data(param.Parameterized):
    df = param.DataFrame()
    
    def loader(self):
        file_input = pn.widgets.FileInput(accept=".xlsx")
        
        @pn.depends(file_input, watch = True)
        def load_excel(data=file_input.value,self=self):
            if data is not None:
                self.df = pd.read_excel(io.BytesIO(data))
                ##Process self.df here further##
                
        return file_input

data = Data()

pn.Column(data.loader, pn.widgets.DataFrame.from_param(data.param['df'], height = 500))

I’m not sure on the previous errors you mention as I don’t see it or where it is being logged. But I’m also missing something, why all in one cell?

randomBloke · August 23, 2023, 11:57am

This code makes more sense.

The data type is still __main__.Data instead of a dataframe and the error is still present. Full error:

{
	"name": "DeserializationError",
	"message": "can't resolve reference 'de04d889-b849-472f-8e02-d448aa665019'",
	"stack": "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mDeserializationError\u001b[0m                      Traceback (most recent call last)\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\jupyter_bokeh\\widgets.py:135\u001b[0m, in \u001b[0;36mBokehModel._sync_model\u001b[1;34m(self, _model, content, _buffers)\u001b[0m\n\u001b[0;32m    133\u001b[0m \u001b[39massert\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_document \u001b[39mis\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[0;32m    134\u001b[0m deserializer \u001b[39m=\u001b[39m Deserializer(\u001b[39mlist\u001b[39m(\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_document\u001b[39m.\u001b[39mmodels), setter\u001b[39m=\u001b[39msetter)\n\u001b[1;32m--> 135\u001b[0m event \u001b[39m=\u001b[39m deserializer\u001b[39m.\u001b[39;49mdeserialize(Serialized(content\u001b[39m=\u001b[39;49mcontent, buffers\u001b[39m=\u001b[39;49m[]))\n\u001b[0;32m    137\u001b[0m kind \u001b[39m=\u001b[39m event[\u001b[39m\"\u001b[39m\u001b[39mkind\u001b[39m\u001b[39m\"\u001b[39m]\n\u001b[0;32m    138\u001b[0m \u001b[39mif\u001b[39;00m kind \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mModelChanged\u001b[39m\u001b[39m\"\u001b[39m:\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:501\u001b[0m, in \u001b[0;36mDeserializer.deserialize\u001b[1;34m(self, obj)\u001b[0m\n\u001b[0;32m    499\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39mdeserialize\u001b[39m(\u001b[39mself\u001b[39m, obj: Any \u001b[39m|\u001b[39m Serialized[Any]) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Any:\n\u001b[0;32m    500\u001b[0m     \u001b[39mif\u001b[39;00m \u001b[39misinstance\u001b[39m(obj, Serialized):\n\u001b[1;32m--> 501\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mdecode(obj\u001b[39m.\u001b[39;49mcontent, obj\u001b[39m.\u001b[39;49mbuffers)\n\u001b[0;32m    502\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[0;32m    503\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mdecode(obj)\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:516\u001b[0m, in \u001b[0;36mDeserializer.decode\u001b[1;34m(self, obj, buffers)\u001b[0m\n\u001b[0;32m    513\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decoding \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m    515\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m--> 516\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_decode(obj)\n\u001b[0;32m    517\u001b[0m \u001b[39mfinally\u001b[39;00m:\n\u001b[0;32m    518\u001b[0m     \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_buffers\u001b[39m.\u001b[39mclear()\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:557\u001b[0m, in \u001b[0;36mDeserializer._decode\u001b[1;34m(self, obj)\u001b[0m\n\u001b[0;32m    555\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode_ref(cast(Ref, obj))\n\u001b[0;32m    556\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 557\u001b[0m         \u001b[39mreturn\u001b[39;00m {key: \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode(val) \u001b[39mfor\u001b[39;00m key, val \u001b[39min\u001b[39;00m obj\u001b[39m.\u001b[39mitems()}\n\u001b[0;32m    558\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39misinstance\u001b[39m(obj, \u001b[39mlist\u001b[39m):\n\u001b[0;32m    559\u001b[0m     \u001b[39mreturn\u001b[39;00m [\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode(entry) \u001b[39mfor\u001b[39;00m entry \u001b[39min\u001b[39;00m obj]\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:557\u001b[0m, in \u001b[0;36m<dictcomp>\u001b[1;34m(.0)\u001b[0m\n\u001b[0;32m    555\u001b[0m         \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode_ref(cast(Ref, obj))\n\u001b[0;32m    556\u001b[0m     \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 557\u001b[0m         \u001b[39mreturn\u001b[39;00m {key: \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_decode(val) \u001b[39mfor\u001b[39;00m key, val \u001b[39min\u001b[39;00m obj\u001b[39m.\u001b[39mitems()}\n\u001b[0;32m    558\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39misinstance\u001b[39m(obj, \u001b[39mlist\u001b[39m):\n\u001b[0;32m    559\u001b[0m     \u001b[39mreturn\u001b[39;00m [\u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode(entry) \u001b[39mfor\u001b[39;00m entry \u001b[39min\u001b[39;00m obj]\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:555\u001b[0m, in \u001b[0;36mDeserializer._decode\u001b[1;34m(self, obj)\u001b[0m\n\u001b[0;32m    553\u001b[0m         \u001b[39mself\u001b[39m\u001b[39m.\u001b[39merror(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39munable to decode an object of type \u001b[39m\u001b[39m'\u001b[39m\u001b[39m{\u001b[39;00m\u001b[39mtype\u001b[39m\u001b[39m}\u001b[39;00m\u001b[39m'\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m    554\u001b[0m \u001b[39melif\u001b[39;00m \u001b[39m\"\u001b[39m\u001b[39mid\u001b[39m\u001b[39m\"\u001b[39m \u001b[39min\u001b[39;00m obj:\n\u001b[1;32m--> 555\u001b[0m     \u001b[39mreturn\u001b[39;00m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_decode_ref(cast(Ref, obj))\n\u001b[0;32m    556\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m    557\u001b[0m     \u001b[39mreturn\u001b[39;00m {key: \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_decode(val) \u001b[39mfor\u001b[39;00m key, val \u001b[39min\u001b[39;00m obj\u001b[39m.\u001b[39mitems()}\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:569\u001b[0m, in \u001b[0;36mDeserializer._decode_ref\u001b[1;34m(self, obj)\u001b[0m\n\u001b[0;32m    567\u001b[0m     \u001b[39mreturn\u001b[39;00m instance\n\u001b[0;32m    568\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m--> 569\u001b[0m     \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49merror(\u001b[39mf\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mcan\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39mt resolve reference \u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39m{\u001b[39;49;00m\u001b[39mid\u001b[39;49m\u001b[39m}\u001b[39;49;00m\u001b[39m'\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n\nFile \u001b[1;32m~\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python310\\site-packages\\bokeh\\core\\serialization.py:717\u001b[0m, in \u001b[0;36mDeserializer.error\u001b[1;34m(self, message)\u001b[0m\n\u001b[0;32m    716\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39merror\u001b[39m(\u001b[39mself\u001b[39m, message: \u001b[39mstr\u001b[39m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m NoReturn:\n\u001b[1;32m--> 717\u001b[0m     \u001b[39mraise\u001b[39;00m DeserializationError(message)\n\n\u001b[1;31mDeserializationError\u001b[0m: can't resolve reference 'de04d889-b849-472f-8e02-d448aa665019'"
}

I want it in one cell, because I need the data to be available as a dataframe in the app I am building

randomBloke · August 23, 2023, 12:22pm

I decided to do it as follows:

import panel as pn    
import pandas as pd
from io import BytesIO
pn.extension()

file_input = pn.widgets.FileInput(accept=".csv")
confirmButton = pn.widgets.Button(name="Confirm")

from io import BytesIO
def get_data(file):
    if file is not None:
        df = pd.read_csv(BytesIO(file))
        return df
  
data = pn.bind(get_data, file_input)

df = pd.DataFrame()

def confirmSelection(file):
    global df
    df = data()

confirmButton.on_click(confirmSelection)

pn.Column(file_input, confirmButton)

This is probably not the most elegant solution, but it produces no errors and the code is easily understandable

carl · August 23, 2023, 12:26pm

I believe your doing

type(data)

That is the class and as I understand it your dataframe is in the class as a parameter so your type should be on

type(data.param['df'])

giving you the param.DataFrame

I guess this is a Jupyter thing, it’s reached something it wants to return as an answer so the cell executes to that point is how I see it.

carl · August 23, 2023, 12:38pm

seems to be some bits missing from this code, doesn’t run as standalone that I can make work.

randomBloke · August 23, 2023, 12:45pm

I updated the comment you replied to. The code should now work. Do note, however, that I used CSV files and not Excel.