MFCC Calculation

Description

Calculate Mel-frequency cepstral coefficients.

Usage

1
2
3
4
5
6
7
melfcc(samples, sr = samples@samp.rate, wintime = 0.025, 
    hoptime = 0.01, numcep = 12, lifterexp = 0.6, htklifter = FALSE,
    sumpower = TRUE, preemph = 0.97, dither = FALSE,
    minfreq = 0, maxfreq = sr/2, nbands = 40, bwidth = 1, 
    dcttype = c("t2", "t1", "t3", "t4"), 
    fbtype = c("mel", "htkmel", "fcmel", "bark"), usecmp = FALSE, 
    modelorder = NULL, spec_out = FALSE, frames_in_rows = TRUE)

Arguments

samples

Object of Wave-class or WaveMC-class. Only the first channel will be used.

sr

Sampling rate of the signal.

wintime

Window length in sec.

hoptime

Step between successive windows in sec.

numcep

Number of cepstra to return.

lifterexp

Exponent for liftering; 0 = none.

htklifter

Use HTK sin lifter.

sumpower

If sumpower = TRUE the frequency scale transformation is based on the powerspectrum, if sumpower = FALSE it is based on its squareroot (absolute value of the spectrum) and squared afterwards.

preemph

Apply pre-emphasis filter [1 -preemph] (0 = none).

dither

Add offset to spectrum as if dither noise.

minfreq

Lowest band edge of mel filters (Hz).

maxfreq

Highest band edge of mel filters (Hz).

nbands

Number of warped spectral bands to use.

bwidth

Width of spectral bands in Bark/Mel.

dcttype

Type of DCT used - 1 or 2 (or 3 for HTK or 4 for feacalc).

fbtype

Auditory frequency scale to use: "mel", "bark", "htkmel", "fcmel".

usecmp

Apply equal-loudness weighting and cube-root compression (PLP instead of LPC).

modelorder

If modelorder > 0, fit a linear prediction (autoregressive-) model of this order and calculation of cepstra out of lpcas.

spec_out

Should matrices of the power- and the auditory-spectrum be returned.

frames_in_rows

Return time frames in rows instead of columns (original Matlab code).

Details

Calculation of the MFCCs imlcudes the following steps:

  1. Preemphasis filtering

  2. Take the absolute value of the STFT (usage of Hamming window)

  3. Warp to auditory frequency scale (Mel/Bark)

  4. Take the DCT of the log-auditory-spectrum

  5. Return the first ‘ncep’ components

Value

cepstra

Cepstral coefficients of the input signal (one time frame per row/column)

aspectrum

Auditory spectrum (spectrum after transformation to Mel/Bark scale) of the signal

pspectrum

Power spectrum of the input signal.

lpcas

If modelorder > 0, the linear prediction coefficients (LPC/PLP).

Note

The following non-default values nearly duplicate Malcolm Slaney's mfcc (i.e.

1
2
melfcc(d, 16000, wintime=0.016, lifterexp=0, minfreq=133.33, 
       maxfreq=6855.6, sumpower=FALSE)

=~= log(10) * 2 * mfcc(d, 16000) in the Auditory toolbox for Matlab).

The following non-default values nearly duplicate HTK's MFCC (i.e.

1
2
melfcc(d, 16000, lifterexp=22, htklifter=TRUE, nbands=20, maxfreq=8000, 
    sumpower=FALSE, fbtype="htkmel", dcttype="t3")

=~= 2 * htkmelfcc(:,[13,[1:12]]) where HTK config has ‘PREEMCOEF = 0.97’, ‘NUMCHANS = 20’, ‘CEPLIFTER = 22’, ‘NUMCEPS = 12’, ‘WINDOWSIZE = 250000.0’, ‘USEHAMMING = T’, ‘TARGETKIND = MFCC_0’).

For more detail on reproducing other programs' outputs, see http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/mfccs.html

Author(s)

Sebastian Krey krey@statistik.tu-dortmund.de

References

Daniel P. W. Ellis: http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/

Examples

1
2
3
4
5
6
7
  testsound <- normalize(sine(400) + sine(1000) + square(250), "16")
  m1 <- melfcc(testsound)

  #Use PLP features to calculate cepstra and output the matrices like the
  #original Matlab code (note: modelorder limits the number of cepstra)
  m2 <- melfcc(testsound, numcep=9, usecmp=TRUE, modelorder=8, 
    spec_out=TRUE, frames_in_rows=FALSE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.