View source: R/extract_features.R
extract_features | R Documentation |
Extracts features from WAV audio files.
extract_features(
x,
features = c("f0", "fmt", "gain"),
filesRange = NULL,
sex = "u",
windowShift = 10,
numFormants = 8,
numcep = 12,
dcttype = c("t2", "t1", "t3", "t4"),
fbtype = c("mel", "htkmel", "fcmel", "bark"),
resolution = 40,
usecmp = FALSE,
mc.cores = 1,
full.names = TRUE,
recursive = FALSE,
check.mono = FALSE,
stereo2mono = FALSE,
overwrite = FALSE,
freq = 44100,
round.to = NULL,
verbose = FALSE,
pycall = "~/miniconda3/envs/pyvoice/bin/python"
)
x |
A vector containing either files or directories of audio files in WAV format. |
features |
Vector of features to be extracted. (Default: |
filesRange |
The desired range of directory files (Default: |
sex |
|
windowShift |
|
numFormants |
|
numcep |
Number of Mel-frequency cepstral coefficients (cepstra) to return (Default: |
dcttype |
Type of DCT used. |
fbtype |
Auditory frequency scale to use: |
resolution |
|
usecmp |
Logical. Apply equal-loudness weighting and cube-root compression (PLP instead of LPC) (Default: |
mc.cores |
Number of cores to be used in parallel processing. (Default: |
full.names |
Logical. If |
recursive |
Logical. Should the listing recursively into directories? (Default: |
check.mono |
Logical. Check if the WAV file is mono. (Default: |
stereo2mono |
(Experimental) Logical. Should files be converted from stereo to mono? (Default: |
overwrite |
(Experimental) Logical. Should converted files be overwritten? If not, the file gets the suffix |
freq |
Frequency in Hz to write the converted files when |
round.to |
Number of decimal places to round to. (Default: |
verbose |
Logical. Should the running status be showed? (Default: |
pycall |
Python call. See https://github.com/filipezabala/voice for details. |
The feature 'df' corresponds to 'formant dispersion' (df2:df8) by
Fitch (1997), 'pf' to formant position' (pf1:pf8) by Puts, Apicella & Cárdena
(2011), 'rf' to 'formant removal' (rf1:rf8) by Zabala (2023), 'rcf' to
'formant cumulated removal' (rcf2:rcf8) by Zabala (2023) and 'rpf' to
'formant position removal' (rpf2:rpf8) by Zabala (2023).
The 'fmt_praat'
feature may take long time processing. The following
features may contain a variable number of columns: 'cep'
, 'dft'
,
'css'
and 'lps'
.
A Media data frame containing the selected features.
Levinson N. (1946). The Wiener (root mean square) error criterion in filter design and prediction. Journal of Mathematics and Physics, 25(1-4), 261–278. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/SAPM1946251261")})
Durbin J. (1960). “The fitting of time-series models.” Revue de l’Institut International de Statistique, pp. 233–244. (https://www.jstor.org/stable/1401322)
Cooley J.W., Tukey J.W. (1965). “An algorithm for the machine calculation of complex Fourier series.” Mathematics of computation, 19(90), 297–301. (https://www.ams.org/journals/mcom/1965-19-090/S0025-5718-1965-0178586-1/)
Wasson D., Donaldson R. (1975). “Speech amplitude and zero crossings for automated identification of human speakers.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(4), 390–392. (https://ieeexplore.ieee.org/document/1162690)
Allen J. (1977). “Short term spectral analysis, synthesis, and modification by discrete Fourier transform.” IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(3), 235– 238. (https://ieeexplore.ieee.org/document/1162950)
Schäfer-Vincent K. (1982). "Significant points: Pitch period detection as a problem of segmentation." Phonetica, 39(4-5), 241–253. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1159/000261665")} )
Schäfer-Vincent K. (1983). "Pitch period detection and chaining: Method and evaluation." Phonetica, 40(3), 177–202. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1159/000261691")})
Ephraim Y., Malah D. (1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator.” IEEE Transactions on acoustics, speech, and signal processing, 32(6), 1109–1121. (https://ieeexplore.ieee.org/document/1164453)
Delsarte P., Genin Y. (1986). “The split Levinson algorithm.” IEEE transactions on acoustics, speech, and signal processing, 34(3), 470–478. (https://ieeexplore.ieee.org/document/1164830)
Jackson J.C. (1995). "The Harmonic Sieve: A Novel Application of Fourier Analysis to Machine Learning Theory and Practice." Technical report, Carnegie-Mellon University Pittsburgh PA Schoo; of Computer Science.
Fitch, W.T. (1997) "Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques." J. Acoust. Soc. Am. 102, 1213 – 1222. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1121/1.421048")})
Boersma P., van Heuven V. (2001). Praat, a system for doing phonetics by computer. Glot. Int., 5(9/10), 341–347. (https://www.fon.hum.uva.nl/paul/papers/speakUnspeakPraat_glot2001.pdf)
Ellis DPW (2005). “PLP and RASTA (and MFCC, and inversion) in Matlab.” Online web resource.
Puts, D.A., Apicella, C.L., Cardenas, R.A. (2012) "Masculine voices signal men's threat potential in forager and industrial societies." Proc. R. Soc. B Biol. Sci. 279, 601–609. (\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1098/rspb.2011.0829")})
library(voice)
# get path to audio file
path2wav <- list.files(system.file('extdata', package = 'wrassp'),
pattern = glob2rx('*.wav'), full.names = TRUE)
# minimal usage
M1 <- extract_features(path2wav)
M2 <- extract_features(dirname(path2wav))
identical(M1,M2)
table(basename(M1$wav_path))
# limiting filesRange
M3 <- extract_features(path2wav, filesRange = 3:6)
table(basename(M3$wav_path))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.