MFCC for Audio Extraction

October 27, 2020

Audio Features

MFCC- Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip.

Mel Spectogram- A mel spectrogram is a spectrogram where the frequencies are converted to the mel scale.

Audio Data Domain

There are two different domains :-

1. Time domain

The major parts of Time Domain are Sampling and Quantization.
Sampling means measuring the instantaneous values of continuous-time signal in a discrete form. Audio wave is a continuous signal. First we have to consider a sampling frequency (Fs - how many data points we are storing for audio at a particular point).

2. Frequency domain

The Frequency Domain refers to the analytic space in which mathematical functions or signals are conveyed in terms of frequency, rather than time. For example, where a time-domain graph may display changes over time, a frequency-domain graph displays how much of the signal is present among each given frequency band.

An example of this is transformation of Cartesian Coordinate to a Polar Coordinate.

For time to frequency transformation, we need forward transformations and for frequency to time transformation, we need inverse transformations.

Frequency-time relation

f(t) = a sin wt, where t is the time period, a is the amplitude and w is the frequency or angular frequency.

We can also have multiple frequencies in a same wave. So,

f(t) = a1 sin wt + a2 sin 2wt +..................

In MFCC, we apply triangular filter on a set of frequencies to select a few of them. If we have M filters, we will have M positions for it. For each position of our filters we get M values and if we have T such positions, we have T x M shape arrays.

Search This Blog

Data Science