Problem with accessing pdf file(s) input in Panel

I’m very new to panel and was trying to understand some basic concepts of how file input works. The app I am trying to create takes in pdf documents and reads them with pyPDF Reader for specific tasks, and with another UI I created, I simply had to find the temporary directory my input files were stored in, then had a separate function which looped through all the documents in the directory and read them with PyPDF reader. From what I’ve read with Panel, you can’t access the directory (if I am understanding right) but file_input object itself should contain the files themselves for you to access.
When I have tried to play around with a single file input object I cannot seem to even to convert it from a bytes object to something usable. Here is some rudimentary code I have used:
‘’’
import io
import panel as pn
import base64
from PyPDF2 import PdfReader

pn.extension()
file_input = pn.widgets.FileInput()

buffer=base64.b64decode(file_input.value)
f=io.BytesIO(buffer)
reader = PdfReader(f)
page = reader.pages[0]

The error I get is this: 
PdfReadError: EOF marker not found

I have seen the Byte output to know that it is definitely taking in this file properly but I cannot access anything of the file nor can I seem to decode it. If I am reading the documentation right, it should have been encoded as a base 64 string so I am not sure what else to do. 
How can a file_input object be properly read into PyPDF Reader? And is there a way to get the directory where files are temporarily saved in Panel?

Try to remove b64decode: f = io.BytesIO(file_input.value)

I made an example using the FileDropper.

image

import io
import panel as pn
import base64
from PyPDF2 import PdfReader

pn.extension('filedropper')

def transform_pdfs(value):
    if not value:
        return {}
    pages = {}
    for key, value in value.items():
        f=io.BytesIO(value)
        reader = PdfReader(f)
        page = reader.pages[0]
    return {"file": key, "PyPDF2.PdfReader": str(page)}

file_input = pn.widgets.FileDropper(accepted_filetypes=["application/pdf"], multiple=False)

pn.Column(
   "## Upload PDF",  file_input, pn.pane.JSON(object=pn.bind(transform_pdfs, file_input,), theme="light")
).servable()

1 Like

Thanks. If I was to modify it to take in multiple files however, what variable am I looping through to process each file? Would value.items() in this case be a list rather than a single file object?

value is a dictionary. And with value.items() you can look through its keys and values.