extract_features: Extract audio features

View source: R/extract_features.R

extract_featuresR Documentation

Extract audio features

Description

Extracts features from WAV audio files.

Usage

extract_features(
  x,
  features = c("f0", "fmt", "gain"),
  filesRange = NULL,
  sex = "u",
  windowShift = 10,
  numFormants = 8,
  numcep = 12,
  dcttype = c("t2", "t1", "t3", "t4"),
  fbtype = c("mel", "htkmel", "fcmel", "bark"),
  resolution = 40,
  usecmp = FALSE,
  mc.cores = 1,
  full.names = TRUE,
  recursive = FALSE,
  check.mono = FALSE,
  stereo2mono = FALSE,
  overwrite = FALSE,
  freq = 44100,
  round.to = NULL,
  verbose = FALSE,
  pycall = "~/miniconda3/envs/pyvoice/bin/python"
)

Arguments

x

A vector containing either files or directories of audio files in WAV format.

features

Vector of features to be extracted. (Default: 'f0','fmt','gain'). Available features: 'f0','f0_mhs','f0_praat','fmt','fmt_praat','zcr','rms','gain','rfc','ac','cep','dft','css','lps','mfcc','df','pf','rf','rcf','rpf'.

filesRange

The desired range of directory files (Default: NULL, i.e., all files). Should only be used when all the WAV files are in the same folder.

sex

= <code> set sex specific parameters where <code> = 'f'[emale], 'm'[ale] or 'u'[nknown] (Default: 'u'). Used as 'gender' by wrassp::ksvF0, wrassp::forest and wrassp::mhsF0.

windowShift

= <dur> set analysis window shift to <dur>ation in ms (Default: 5.0). Used by wrassp::ksvF0, wrassp::forest, wrassp::mhsF0, wrassp::zcrana, wrassp::rfcana, wrassp::acfana, wrassp::cepstrum, wrassp::dftSpectrum, wrassp::cssSpectrum and wrassp::lpsSpectrum.

numFormants

= <num> <num>ber of formants (Default: 8). Used by wrassp::forest.

numcep

Number of Mel-frequency cepstral coefficients (cepstra) to return (Default: 12). Used by tuneR::melfcc.

dcttype

Type of DCT used. 't1' or 't2', 't3' for HTK 't4' for feacalc (Default: 't2'). Used by tuneR::melfcc.

fbtype

Auditory frequency scale to use: 'mel', 'bark', 'htkmel', 'fcmel' (Default: 'mel'). Used by tuneR::melfcc.

resolution

= <freq> set FFT length to the smallest value which results in a frequency resolution of <freq> Hz or better (Default: 40.0). Used by wrassp::cssSpectrum, wrassp::dftSpectrum and wrassp::lpsSpectrum.

usecmp

Logical. Apply equal-loudness weighting and cube-root compression (PLP instead of LPC) (Default: FALSE). Used by tuneR::melfcc.

mc.cores

Number of cores to be used in parallel processing. (Default: 1)

full.names

Logical. If TRUE, the directory path is prepended to the file names to give a relative file path. If FALSE, the file names (rather than paths) are returned. (Default: TRUE) Used by base::list.files.

recursive

Logical. Should the listing recursively into directories? (Default: FALSE) Used by base::list.files.

check.mono

Logical. Check if the WAV file is mono. (Default: TRUE)

stereo2mono

(Experimental) Logical. Should files be converted from stereo to mono? (Default: TRUE)

overwrite

(Experimental) Logical. Should converted files be overwritten? If not, the file gets the suffix _mono. (Default: FALSE)

freq

Frequency in Hz to write the converted files when stereo2mono=TRUE. (Default: 44100)

round.to

Number of decimal places to round to. (Default: NULL)

verbose

Logical. Should the running status be showed? (Default: FALSE)

pycall

Python call. See https://github.com/filipezabala/voice for details.

Details

The feature 'df' corresponds to 'formant dispersion' (df2:df8) by Fitch (1997), 'pf' to formant position' (pf1:pf8) by Puts, Apicella & Cárdena (2011), 'rf' to 'formant removal' (rf1:rf8) by Zabala (2023), 'rcf' to 'formant cumulated removal' (rcf2:rcf8) by Zabala (2023) and 'rpf' to 'formant position removal' (rpf2:rpf8) by Zabala (2023). The 'fmt_praat' feature may take long time processing. The following features may contain a variable number of columns: 'cep', 'dft', 'css' and 'lps'.

Value

A Media data frame containing the selected features.

References

Levinson N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1-4), 261–278. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/SAPM1946251261")})

Durbin J. (1960). “The fitting of time-series models.” Revue de l’Institut International de Statistique, pp. 233–244. (https://www.jstor.org/stable/1401322)

Cooley J.W., Tukey J.W. (1965). “An algorithm for the machine calculation of complex Fourier series.” Mathematics of computation, 19(90), 297–301. (https://www.ams.org/journals/mcom/1965-19-090/S0025-5718-1965-0178586-1/)

Wasson D., Donaldson R. (1975). “Speech amplitude and zero crossings for automated identification of human speakers.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(4), 390–392. (https://ieeexplore.ieee.org/document/1162690)

Allen J. (1977). “Short term spectral analysis, synthesis, and modification by discrete Fourier transform.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(3), 235– 238. (https://ieeexplore.ieee.org/document/1162950)

Schäfer-Vincent K. (1982). "Significant points: Pitch period detection as a problem of segmentation." Phonetica, 39(4-5), 241–253. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1159/000261665")} )

Schäfer-Vincent K. (1983). "Pitch period detection and chaining: Method and evaluation." Phonetica, 40(3), 177–202. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1159/000261691")})

Ephraim Y., Malah D. (1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator.” IEEE Transactions on acoustics, speech, and signal processing, 32(6), 1109–1121. (https://ieeexplore.ieee.org/document/1164453)

Delsarte P., Genin Y. (1986). “The split Levinson algorithm.” IEEE transactions on acoustics, speech, and signal processing, 34(3), 470–478. (https://ieeexplore.ieee.org/document/1164830)

Jackson J.C. (1995). "The Harmonic Sieve: A Novel Application of Fourier Analysis to Machine Learning Theory and Practice." Technical report, Carnegie-Mellon University Pittsburgh PA Schoo; of Computer Science.

Fitch, W.T. (1997) "Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques." J. Acoust. Soc. Am. 102, 1213 – 1222. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1121/1.421048")})

Boersma P., van Heuven V. (2001). Praat, a system for doing phonetics by computer. Glot. Int., 5(9/10), 341–347. (https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf)

Ellis DPW (2005). “PLP and RASTA (and MFCC, and inversion) in Matlab.” Online web resource.

Puts, D.A., Apicella, C.L., Cardenas, R.A. (2012) "Masculine voices signal men's threat potential in forager and industrial societies." Proc. R. Soc. B Biol. Sci. 279, 601–609. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1098/rspb.2011.0829")})

Examples

library(voice)

# get path to audio file
path2wav <- list.files(system.file('extdata', package = 'wrassp'),
pattern = glob2rx('*.wav'), full.names = TRUE)

# minimal usage
M1 <- extract_features(path2wav)
M2 <- extract_features(dirname(path2wav))
identical(M1,M2)
table(basename(M1$wav_path))

# limiting filesRange
M3 <- extract_features(path2wav, filesRange = 3:6)
table(basename(M3$wav_path))

voice documentation built on Aug. 8, 2025, 6:41 p.m.