import panel as pn
import speech_recognition as sr
from marvin.ai.audio import transcribe
from marvin.audio import record_phrase
from openai import AsyncOpenAI
pn.extension()
def recognize_speech(instance, event):
with instance.active_widget.param.update(loading=True):
audio = record_phrase()
transcription = transcribe(audio)
try:
instance.active_widget.value = transcription
except sr.RequestError as e:
instance.stream("Could not request results; {0}".format(e), user="System")
except sr.UnknownValueError:
instance.stream("Unknown error occurred", user="System")
async def callback(contents: str, user: str, instance: pn.chat.ChatInterface):
messages = instance.serialize()
response = await aclient.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True,
)
message = ""
async for chunk in response:
part = chunk.choices[0].delta.content
if part is not None:
message += part
yield message
aclient = AsyncOpenAI()
chat = pn.chat.ChatInterface(
callback=callback,
button_properties={"speak": {"callback": recognize_speech, "icon": "microphone"}},
show_rerun=False,
show_undo=False,
show_clear=False,
)
chat.servable()
You can also make it speak back!
import panel as pn
import speech_recognition as sr
from marvin.ai.audio import transcribe, speak_async
from marvin.audio import record_phrase
from openai import AsyncOpenAI
pn.extension()
def recognize_speech(instance, event):
with instance.active_widget.param.update(loading=True), instance.param.update(loading=True):
audio = record_phrase()
transcription = transcribe(audio)
instance.active_widget.value = transcription
async def callback(contents: str, user: str, instance: pn.chat.ChatInterface):
messages = instance.serialize()
response = await aclient.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True,
)
message = ""
async for chunk in response:
part = chunk.choices[0].delta.content
if part is not None:
message += part
yield message
await (await speak_async(message, voice="shimmer")).play_async()
aclient = AsyncOpenAI()
chat = pn.chat.ChatInterface(
callback=callback,
button_properties={"speak": {"callback": recognize_speech, "icon": "microphone"}},
show_rerun=False,
show_undo=False,
show_clear=False,
)
chat.servable()
Thank you @ahuang11 so much for your example!
I have 2 little issues with your proposed approach:
- I understand this is recording the audio of the machine running the code (note: I did some minor research on marvin and itās dependencies speech_recognition and pyaudio). I aim to deploy an app in a server, and people will access through the browser, so doing
marvin.audio.record_phrase
will not work, as it defaults to capture from the audio card of the computer where it is running. So I was aiming to use Panelās SpeechToText as it does the JS magic to capture the audio (and send to the defaultās browser implementations of STT). I also tried to find a ārecord audioā via the browser in Panel but I had no luck. - This uses OpenAI in the background, which implies a necessary API key, which implies paying for it. Albeit Iām not against it (in the end Iāll be using some LLM that most probably will be paid) having the āfreeā option given by the browser for both STT and TTS (I know in the end it basically sends it to google and thereās a limited amount until getting blocked/charged for stt) is a great way to develop and test.
Overall, I would like to, again, thank you SO MUCH, for spending your time and energies helping me (this convo started in github). If you, or any other kind soul, knows how to make a widget to just capture audio via the browser in panel (so I can then redirect that audio to any STT service of my choosing), Iād love to learn about it. Iām not well versed in JS, and just getting into learning Panel (which seems to use Bokeh in the background?) so I may be hitting my head against a wall for a while if I try to implement it myself.
Have a great day/night!
Thanks for the feedback. For SpeechToText can you submit an issue on GitHub to make it workable with ChatInterface?
For point #2 for speech to text, I think you can use speech_recognition without marvin and use recognize_whisper
which will run on your own machine. (I recommend just asking ChatGPT/Claude on how to migrate to vanilla speech_recongition). You can also use elevenlabs for text to speech under the free tier.