Audio Processing in Python – Introduction to Python librosa

Filed Under: Python Modules
An Introduction To Audio Processing

In this article, we’ll talk about Audio processing in Python. Let’s diverge a little from our Natural language processing and text analysis aspects of Python and ML. Today, I’m going to discuss a Python audio processing library called librosa.

What is librosa?

Librosa is a Python package for the analysis of music and audio. It provides the building blocks required to construct structures for the retrieval of music knowledge.

Audio Processing in Python

Now that you know the library that we’re going to use for our audio processing task, let’s move ahead to working with the library and process an mp3 audio file.

1. Installing Librosa for Audio Processing in Python

We can easily install librosa with the pip command:

pip install librosa

Let’s load in a short mp3 file (You can use any mp3 file for this demonstration):

y, sr = librosa.load('/content/Kids Cheering - Gaming Sound Effect (HD) (128  kbps).mp3')

2. Processing audio as time series

In the above line, the load function reads the audio mp3 as a time series. Here, sr stands for sample_rate.

If you want a refresher on time series, go here: Time Series Data and Machine Learning.

  • Time series is represented by an array.
  • The sample rate is the number of samples per second of audio.

Audio is mixed to mono by default. You then resample it at load time to 22050 Hz. By offering additional reasons for librosa.load, this action can be overridden.

3. Retrieve the features of an audio file

There are some important features of an audio sample, that we’ll quickly discuss:

There is a very simple fundamental rhythm in some forms of musical patterns, while others have a more nuanced or inferred one.

  • Tempo: is the pace at which your patterns replicate. You measure tempo in beats per minute (BPM). So if we talk about a piece of music being at 120 BPM, we say that every minute there are 120 beats (pulses).
  • Beat: a period of time. It is basically the rhythm that you will clap to in a song. You get four beats in your bar in 4/4 time, for instance.
  • Bar: a bar is a logical set of beats. Usually, bars get 3 or 4 beats, although other possibilities are possible.
  • Step: In composition programs, I typically see this. It is normal to have a sequence of notes, such as 8 sixteenth notes, that are all of the same lengths. The difference between each note is the move. If you found this, you would like to walk on the sixteenth notes. Usually, you set eighth notes or triplets or quarter notes for your move.
  • Rhythm: This is a list of musical sounds. In a statement, take all the notes and that is the rhythm.

We can get the tempo and beats from the audio:

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
Tempo And Beat
Tempo And Beat

4. Mel Frequency Cepstral Coefficients (MFCC)

Mel Frequency Cepstral Coefficients – one of the most important features in audio processing. It’s a topic of its own so instead, here’s the Wikipedia page for you to refer to.

The MFCC is a matrix of values that capture the timbral aspects of a musical instrument, like how wood guitars and metal guitars sound a little different. This is not captured by other measures as it is most similar to human hearing.

mfcc = librosa.feature.mfcc(y=y, sr=sr, hop_length=hop_length, n_mfcc=13)
import seaborn as sns
mfcc_delta =

Here we are creating a heatmap using the mfcc data, which you can see gives us the output as below:

Mfcc Heatmap With Seaborn
Mfcc Heatmap With Seaborn

Normalizing the mfcc into a chromagram, we get:

chromagram = librosa.feature.chroma_cqt(y=y_harmonic, sr=sr)
Chromagram Librosa
Chromagram Librosa

I assume you got some of the ideas behind extracting audio data for different deep learning algorithms for feature extraction activities.

Ending Note

Continue to follow our machine learning in Python tutorials. We have a lot more to come up in the near future. If you are a beginner in Python and accidentally landed here (you won’t be the first!), take a look at the Python tutorial for beginners.

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors