MFCC for Audio Extraction
Audio Features
- MFCC- Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip.
- Mel Spectogram- A mel spectrogram is a spectrogram where the frequencies are converted to the mel scale.
Audio Data Domain
There are two different domains :-
1. Time domain
The major parts of Time Domain are Sampling and Quantization.Sampling means measuring the instantaneous values of continuous-time signal in a discrete form. Audio wave is a continuous signal. First we have to consider a sampling frequency (Fs - how many data points we are storing for audio at a particular point).
2. Frequency domain
The Frequency Domain refers to the analytic space in which mathematical functions or signals are conveyed in terms of frequency, rather than time. For example, where a time-domain graph may display changes over time, a frequency-domain graph displays how much of the signal is present among each given frequency band.
An example of this is transformation of Cartesian Coordinate to a Polar Coordinate.
For time to frequency transformation, we need forward transformations and for frequency to time transformation, we need inverse transformations.
Frequency-time relation
f(t) = a sin wt, where t is the time period, a is the amplitude and w is the frequency or angular frequency.
We can also have multiple frequencies in a same wave. So,
f(t) = a1 sin wt + a2 sin 2wt +..................
In MFCC, we apply triangular filter on a set of frequencies to select a few of them. If we have M filters, we will have M positions for it. For each position of our filters we get M values and if we have T such positions, we have T x M shape arrays.
Comments
Post a Comment