Python Speech to Text conversion using SpeechRecognition

Filed Under: Python
How To Use The Python

In this article, we will be unveiling the process of Conversion of Speech to Text in Python using SpeechRecognition Library.

Speech Recognition is the process of recognizing the voice and representing it in a textual manner. In today’s fast-moving world, Speech Recognition is useful in many aspects such as Automatic driving car, House Surveillance, etc.

Prerequisites for Python speech to text conversion

Before diving into the process of Python speech to text conversion, it is mandatory for us to install the necessary libraries.

Step 1: Install SpeechRecognition library

pip install speechrecognition
Installation Of speechrecognition Library -- Python Speech to text conversion
Installation Of speechrecognition Library

The SpeechRecognition library is used for the Speech to Text conversion. Moreover, it supports various offline/online speech recognition engines and APIs.

Step 2: Install PyAudio module

pip install pyaudio
Installation Of PyAudio Module -- Python Speech to text conversion
Installation Of PyAudio Module

The PyAudio library serves as a cross-platform Input-Output module and provides bindings with PortAudio. PyAudio enables the user to record and play the audio files irrespective of the platform i.e. it is completely platform-independent.

Understanding Python speech to text conversion using SpeechRecognition module

Step 1: Import the necessary library/module

In the process of conversion of speech to text using SpeechRecognition module, we will have to import the same in our program so as to avail all the functions defined under the module/library.

import speech_recognition

Step 2: Initialize the Speech Recognizer

variable = speech_recognition.Recognizer()

In order to take the input in the audio format and recognize the sound, it is necessary for us to initialize the recognizer to recognize the audio/voice.

Step 3: Set the source of input audio/voice

The input to the speechrecognition module is of two types:

  • Pre-recorded audio file
  • Voice input through default Microphone
with SRG.Microphone() as source

In the above statement, the input to our function is directly recorded through the default microphone. Thus, the Microphone() object is being used to fetch the audio from the microphone.

Note: We need to install the PyAudio module in order to accept the input in audio format from the default microphone.

If you want to convert a pre-recorded audio file to text, we need to follow the following statement:

with SRG.AudioFile(name of the audio file) as source

Step 4: Define the time limit for recording the audio from the microphone.

The record() method is used to set the source of the input and the time for which the microphone needs to accept and record the input audio.

record(source, duration)
  • source: Defines the source of input such as audio file, input from microphone, etc.
  • duration: The time period (in seconds) for which the microphone would be active and accept the input voice from the user.

Step 5: Convert the speech to text using a search engine or an API

The record() function accepts the voice from the user and uploads the same to the speech recognition engine such as google voice recognition engine for speech recognition. It is mandatory for the system to stay connected to the Internet in order to use the google recognition engine.

The recognize_google() function recognizes the input voice passed to it as a parameter and returns it in the text form. If the user wishes to use any other language for speech recognition like Spanish, Japanese, etc, will need to pass the language as a parameter to the function.

Implementation of Python Speech to text conversion using SpeechRecognition library

import speech_recognition as SRG 
import time

store = SRG.Recognizer()
with SRG.Microphone() as s:
    audio_input = store.record(s, duration=7)
    print("Recording time:",time.strftime("%I:%M:%S"))
        text_output = store.recognize_google(audio_input)
        print("Text converted from audio:\n")

        print("Execution time:",time.strftime("%I:%M:%S"))
           print("Couldn't process the audio input.")


Recording time: 01:13:27
Text converted from audio:

Python on Journaldev!
Execution time: 01:13:34


Thus, in this article, we have understood the conversion of Speech to Text in Python using the SpeechRecognition library.



  1. sri says:

    The explanation was really sweet, but how do i take this to the net level of looping this and be able to display text as a scroll so that it is like the closed caption for a briefing or for any other purpose, where if we need to translate it to other lanuguages.

Comments are closed.

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors