Using the SpeechToText widget

I am trying to use the SpeechToText widget in one of my Panel apps. When I try the example in the “Advanced Exmaple” section at the bottom of this page, the widget works as I expect. But when I try to add it to my app, or even copying the example into my VSCode Python notebook and running it with either .servable() (like the example) or .show() (so it shows up in a browser, Chrome for me), I can turn the microphone button on and off, but it never returns any results, leading me to believe it is not hearing me. Has anyone experienced this before?

Here is my example code:

and this is the code copied from the example into my notebook (I removed the grammar part since according to the documentation it does not work with Chrome):

image

1 Like

I encountered the same issue, so I used a toggle button to activate the speech_recognition library instead.

Ah, that makes sense. That workaround worked for me!

1 Like

I created an unpolished widget for his:

from __future__ import annotations
from typing import (
    ClassVar,
    Type,
)

import param
import panel as pn
from panel.widgets import CompositeWidget, Toggle
from panel.layout import Column, ListPanel
from speech_recognition import Microphone, Recognizer, UnknownValueError

import param

pn.extension()


class SpeechRecognizer(CompositeWidget):
    value = param.Boolean(
        default=False,
        doc="""
        Whether the microphone is being used to listen for speech.""",
    )

    transcript = param.String(
        default="",
        doc="""
        The transcript of the speech that was recognized.""",
    )

    phrase_time_limit = param.Number(
        default=3,
        doc="""
        The interval between listening and transcription.""",
    )

    _composite_type: ClassVar[Type[ListPanel]] = Column

    def __init__(self, **params):
        super().__init__(**params)
        self._microphone = Microphone()
        self._recognizer = Recognizer()
        self._stop_listening = None
        self._toggle = Toggle(icon="microphone-off", name="Begin listening")
        self._toggle.link(self, value="value", bidirectional=True)
        self.param.watch(self._start_listening, "value")
        self._composite[:] = [self._toggle]

    def _transcribe_speech(self, recognizer, audio) -> None:
        try:
            text = recognizer.recognize_google(audio)
            self.transcript = f"{self.transcript} {text}"
        except UnknownValueError as e:
            pass

    def _start_listening(self, event) -> None:
        self._toggle.icon = "microphone"
        if event.new:
            self._stop_listening = self._recognizer.listen_in_background(
                self._microphone, self._transcribe_speech, phrase_time_limit=3
            )
        else:
            self._stop_listening(wait_for_stop=False)

    def _server_destroy(self, session_context) -> None:
        if self._stop_listening is not None:
            self._stop_listening(wait_for_stop=False)
        return super()._server_destroy(session_context)


speech_recognizer = SpeechRecognizer()
pn.Column(speech_recognizer, speech_recognizer.param.transcript).servable()

image

1 Like

This is very impressive!

1 Like