Speech Recognition Python | How To Translate Speech To Text

Python Programming (134 Blogs)

Speech is the most common means of communication around the world. Most of the population in the world relies on speech to communicate with each other. Suppose we are building a model and instead of a written approach we want our system to respond to speech, it becomes fairly difficult and requires a lot of data to be processed. A speech recognition system overcomes this barrier by translating speech to text. In this blog, we will go through the speech recognition module in python. Here is the list of the same:

How Speech Recognition Works?

Speech recognition system basically translates the spoken utterances to text. There are various real life examples of speech recognition system. For example- siri, which takes the speech as input and translates it into text.

The advantage of using a speech recognition system is that it overcomes the barrier of literacy. A speech recognition model can serve both literate and illiterate audience as well, since it focuses on spoken utterances.

We can also make an inventory of all the endangered languages around the world using a speech recognition system. While it looks pretty intriguing and not complex at all, a speech recognition system faces a lot of challenges in the making.

Challenges Faced By A Speech Recognition System

A speech recognition system becomes difficult to make because we have so many sources of variability when it comes to speech.

Style of speaking

Every individual person has a varied style of speaking, including accents as well. As we all know, we have different accents for speaking English too. There is american English, British English and so many other accents when it comes to speaking the most common language in the world. Pronunciation also makes it difficult for a speech recognition system to translate the speech altogether.

Environment

Environment adds a lot of background noise to the system as well. An isolated room compared to an auditorium will have a lot a variability in background noises. Even echo can add a lot of noise in the system as well.

Speaker characteristics

An old person’s voice may not the be the same as that of an infant. The characteristics of a person’s speech depends on many factors including the harshness and clarity as well.

Language constraints

Some spoken utterances may not have a viable meaning when it comes to translation.

After overcoming these challenges, it is fairly achievable for any speech recognition system to translate speech to text. Now that we know how speech recognition works, lets take a look at different packages that are available for speech recognition in python.

Packages available for speech recognition in python

apiai
SpeechRecognition
Google_speech_cloud
assemblyai
Pocketsphinx
Watson_developer_cloud
wit

We will go through the details of SpeechRecognition package in this blog, lets also take a look down the memory lane to understand how speech recognition systems have evolved over the years.

The very first prototype of the speech recognition was in fact a toy, named radio rex which came around 1920’s. It had a dog sitting in a dog house which would pop out as soon as someone uttered the word rex.

The only problem with the model was that the spring was attached to an electromagnet which was sensitive to energy ranging around 500hz. Being purely a frequency detector, it could be remotely termed as a speech recognition model.

In 1962, IBM came up with a shoebox model which was able to recognize isolated words and also perform a few arithmetic operations as well.

Then came HARPY from CMU, which was able to recognize connected speech from a 1000 word vocabulary. Around the 1980s people started using statistical models and one of the most used machine learning paradigms was the hidden markov model.

After the introduction of deep neural networks, most of the speech recognition models work on the neural networks. The possibilities are unimaginable with the neural networks, the vocabulary can go upto 10k words and more.

How To Install SpeechRecognition In Python?

To install SpeechRecognition package is python, run the following command in the terminal and it will be installed on your system.

Another approach to this, can be adding the package from the project interpreter if you are using pycharm.

The package has a Recognizer class which is basically where the magic happens. It is basically a class which is used to recognize the speech. Following are seven methods which can read various audio sources using different APIs.

recognize_bing( )
recognize_google( )
recognize_google_cloud( )
recognize_houndify( )
recognize_ibm( )
recognize_wit( )
recognize_sphinx( )

Now, recognize_sphinx can be used to run the speech recognition system offline as well. It requires the installation of Pocketsphinx.


import speechrecognition as sr

#instance of recognizer class
r = sr.Recognizer()

Taking Input From Microphones

To use the microphones, we will have to install pyaudio module as well. We use the microphone class to get the input speech from the microphone instead of any other input method like an audio file.

For most of the projects, we can use the default microphones. But if you do not wish to use the default microphone, you can get the list of microphone names using the list_microphone_names method.

To capture the input from the microphone we use the listen method.


import speechrecognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
      audio = sr.listen(source)

How To Install Pyaudio In Python?

To install Pyaudio in python, run the following command in the terminal or if you are using pycharm add the package from the project interpreter in the settings.

Use Case

We will make a program using the speechrecognition module in python to recognize speech and execute the following:

convert the speech to text
open a URL using webbrowser module
pass a query using speech recognition to make a search in the url

Following is the program for the above problem statement:

import  speech_recognition  as  sr
import  webbrowser  as wb

r1 = sr.Recognizer()
r2 = sr.Recognizer()
r3 = sr.Recognizer()

with  sr.Microphone()  as  source:
    print('[search edureka: search youtube]')
    print('speak now')
    audio = r3.listen(source)

if  'edureka'  in r2.recognize_google(audio):
    r2 = sr.Recognizer()
    url = 'https://www.edureka.co/'
    with  sr.Microphone()  as source:
        print('search your query')
        audio = r2.listen(source)

         try:
            get = r2.recognize_google(audio)
            print(get)
            wb.get().open_new(url+get)
        except  sr.UnknownValueError:
            print('error')
        except  sr.RequestError  as e:
            print('failed'.format(e))

if  'video' in r1.recognize_google(audio):
    r1 = sr.Recognizer()
    url = 'https://www.youtube.com/results?search_query='
    with  sr.Microphone() as source:
        print('search for a video')
        audio = r2.listen(source)

        try:
            get = r1.recognize_google(audio)
            print(get)
            wb.get().open_new(url+get)

        except sr.UnknownValueError:
            print('could not understand')
        except sr.RequestError as e:
            print(failed to get results'.format(e))

You will get the output like it is shown in the image. If you say edureka, it will prompt you to say the query that you want to search in the edureka url that we have written in the url variable. If you say python you will get the following web page opened in the browser.

In this blog, we have discussed how we can use speech recognition in python to translate speech to text using the speechrecognition package.Artificial intelligence has become the need of the hour for concepts like speech recognition or object dejection, with the deep neural networks that provide unimaginable possibilities to speech recognition systems where we can train and test enormous speech data to build a system.You can enroll in the Python online course certification for deep neural networks to master your skills and kickstart your learning.

have any queries? mention them in the comments, we will get back to you.

Introduction to Python

Python Installation

Python Fundamentals

Python OOPs

Python Libraries

Web Scraping

Django

Python Programs

Career Oppurtunities

Interview Questions

Data Science

Speech Recognition Python: How To Translate Speech To Text?

How Speech Recognition Works?

Style of speaking

Environment

Speaker characteristics

Language constraints

How To Install SpeechRecognition In Python?

Taking Input From Microphones

How To Install Pyaudio In Python?

Use Case

Recommended videos for you

Python List, Tuple, String, Set And Dictonary – Python Sequences

Mastering Python : An Excellent tool for Web Scraping and Data Analysis

Application of Clustering in Data Science Using Real-Time Examples

3 Scenarios Where Predictive Analytics is a Must

Linear Regression With R

Machine Learning with Python

Python Tutorial – All You Need To Know In Python Programming

Python Loops – While, For and Nested Loops in Python Programming

Python for Big Data Analytics

Know The Science Behind Product Recommendation With R Programming

Data Science : Make Smarter Business Decisions

Web Scraping And Analytics With Python

Python Programming – Learn Python Programming From Scratch

Business Analytics Decision Tree in R

Business Analytics with R

The Whys and Hows of Predictive Modelling-I

Android Development : Using Android 5.0 Lollipop

Diversity Of Python Programming

Introduction to Business Analytics with R

Python Numpy Tutorial – Arrays In Python

Recommended blogs for you

Time Series Forecasting: Mastering Techniques and Applications

How to Implement Python Libraries

What is NumPy in Python – Introduction to NumPy – NumPy Tutorial

Introduction to Strings in Python

Data Scientist vs Data Analyst vs Data Engineer : Role, Skills, & More

How to Implement Membership Operators in Python

Python Modules- All You Need To know

Big Data Engineer Resume Examples and Tips for 2026

How To Install pip In Python: Get Started With Python Installation

What is Supervised Learning and its different types?

Top 10 Best IDE for Python: How to choose the best Python IDE?

How to Learn Python 3 from Scratch – A Beginners Guide

Implementing K-means Clustering to Classify Bank Customer Using R

What is Unsupervised Learning and How does it Work?

Top 25 Programming Interview Questions for 2026

Java and Python Podcast: Which Language is the Best?

What is Random Number Generator in Python and how to use it?

How to implement Time Sleep in Python?

KNN Algorithm: A Practical Implementation Of KNN Algorithm In R

Who uses R?

Join the discussionCancel reply

Trending Courses in Data Science

Data Science with Python Certification Course

Data Science and Machine Learning Internship ...

Data Analytics with R Programming Certificati ...

Generative AI Internship Program

Data Science with R Programming Certification ...

Statistics Essentials for Analytics

SAS Training and Certification

Analytics for Retail Banks

Decision Tree Modeling Using R Certification ...

Advanced Predictive Modelling in R Certificat ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Speech Recognition Python: How To Translate Speech To Text?