In this article, we will be unveiling the process of Conversion of Speech to Text in Python using SpeechRecognition Library.
Speech Recognition is the process of recognizing the voice and representing it in a textual manner. In today’s fast-moving world, Speech Recognition is useful in many aspects such as Automatic driving car, House Surveillance, etc.
Prerequisites for Python speech to text conversion
Before diving into the process of Python speech to text conversion, it is mandatory for us to install the necessary libraries.
Step 1: Install SpeechRecognition library
pip install speechrecognition
SpeechRecognition library is used for the Speech to Text conversion. Moreover, it supports various offline/online speech recognition engines and APIs.
Step 2: Install PyAudio module
pip install pyaudio
PyAudio library serves as a cross-platform Input-Output module and provides bindings with
PortAudio. PyAudio enables the user to record and play the audio files irrespective of the platform i.e. it is completely platform-independent.
Understanding Python speech to text conversion using SpeechRecognition module
Step 1: Import the necessary library/module
In the process of conversion of speech to text using
SpeechRecognition module, we will have to import the same in our program so as to avail all the functions defined under the module/library.
Step 2: Initialize the Speech Recognizer
variable = speech_recognition.Recognizer()
In order to take the input in the audio format and recognize the sound, it is necessary for us to initialize the recognizer to recognize the audio/voice.
Step 3: Set the source of input audio/voice
The input to the speechrecognition module is of two types:
- Pre-recorded audio file
- Voice input through default Microphone
with SRG.Microphone() as source
In the above statement, the input to our function is directly recorded through the default microphone. Thus, the
Microphone() object is being used to fetch the audio from the microphone.
Note: We need to install the
PyAudio module in order to accept the input in audio format from the default microphone.
If you want to convert a pre-recorded audio file to text, we need to follow the following statement:
with SRG.AudioFile(name of the audio file) as source
Step 4: Define the time limit for recording the audio from the microphone.
record() method is used to set the source of the input and the time for which the microphone needs to accept and record the input audio.
source: Defines the source of input such as audio file, input from microphone, etc.
duration: The time period (in seconds) for which the microphone would be active and accept the input voice from the user.
Step 5: Convert the speech to text using a search engine or an API
The record() function accepts the voice from the user and uploads the same to the speech recognition engine such as google voice recognition engine for speech recognition. It is mandatory for the system to stay connected to the Internet in order to use the google recognition engine.
recognize_google() function recognizes the input voice passed to it as a parameter and returns it in the text form. If the user wishes to use any other language for speech recognition like Spanish, Japanese, etc, will need to pass the
language as a parameter to the function.
Implementation of Python Speech to text conversion using SpeechRecognition library
import speech_recognition as SRG import time store = SRG.Recognizer() with SRG.Microphone() as s: print("Speak...") audio_input = store.record(s, duration=7) print("Recording time:",time.strftime("%I:%M:%S")) try: text_output = store.recognize_google(audio_input) print("Text converted from audio:\n") print(text_output) print("Finished!!") print("Execution time:",time.strftime("%I:%M:%S")) except: print("Couldn't process the audio input.")
Speak... Recording time: 01:13:27 Text converted from audio: Python on Journaldev! Finished!! Execution time: 01:13:34
Thus, in this article, we have understood the conversion of Speech to Text in Python using the SpeechRecognition library.