forest: Estimate formant frequencies and bandwidths
In humlab-speech/superassp: A speech signal processing using various framworks using a wrassp-like interface

forest

R Documentation

Estimate formant frequencies and bandwidths

Description

Formant estimation of the signal(s) in listOfFiles. Raw resonance frequency and bandwidth values are obtained by root-solving of the Linear Prediction polynomial from the autocorrelation method and the Split-Levinson-Algorithm (SLA). Resonances are then classified as formants using the so-called Pisarenko frequencies (by-product of the SLA) and a formant frequency range table derived from the nominal F₁. The latter may have to be increased by about 12% for female voices (see nominalF1 and gender parameters). This function uses the libassp C library \insertCites5hsuperassp for the DSP work.

Usage

forest(
  listOfFiles = NULL,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 20,
  effectiveLength = TRUE,
  nominalF1 = 500,
  gender = "m",
  estimate = FALSE,
  order = 0,
  incrOrder = 0,
  numFormants = 4,
  window = "BLACKMAN",
  preemphasis = -0.8,
  toFile = TRUE,
  explicitExt = "fms",
  outputDirectory = NULL,
  assertLossless = NULL,
  logToFile = FALSE,
  convertOverwrites = FALSE,
  keepConverted = FALSE,
  verbose = TRUE
)

Arguments

`listOfFiles`	vector of file paths to be processed by function
`beginTime`	the time point (in seconds) of the start of the analysed interval. A NULL or 0 is interpreted as the start of the signal file. If a vector of time points is supplied, the length of that vector needs to correspond with the length of `listOfFiles`.
`endTime`	the time point (in seconds) of the end of the analysed interval. A NULL or 0 is interpreted as the end of the signal file. If a vector of time points is supplied, the length of that vector needs to correspond with the length of `listOfFiles`.
`windowShift`	the amount of time (in ms) that the analysis window will be shifted between analysis frames
`effectiveLength`	make window size effective rather than exact
`nominalF1`	= The nominal (assumed) F₁ frequency (default: 500.0 Hz)
`gender`	= Use gender specific parameters? Permitted codes are "f"emale, "m"ale or "u"nknown. When "f", the effective window length is set to 12.5 ms and the nominal F₁ to 560 Hz.
`estimate`	insert rough frequency estimates of missing formants? By default, the frequency is set to zero.
`order`	decrease default LPC filter order by 2 (one resonance less)
`incrOrder`	increase default LPC filter order by 2 (one resonance more)
`numFormants`	= The number of formants to identify. Defaults to 4, and the maximum value is 8 or half the LPC filter order)
`window`	= : set analysis window function to (default: BLACKMAN)
`preemphasis`	= : set pre-emphasis factor to (-1 <= val <= 0) (default: dependent on sample rate and nominal F₁)
`toFile`	Should the function write the results to a file, with the (default) file extension (`TRUE`) or returned as a list of AsspDataObj objects (`FALSE`)?
`explicitExt`	the file extension will be used when result files are written (`toFile=TRUE`), but the file extension can be set to something else using this function argument.
`outputDirectory`	directory in which output files are stored. Defaults to NULL which means that the result file will be stored in the same directory as the input file.
`assertLossless`	an optional list of file extensions that the user wants to assert contains losslessly encoded signals data.
`logToFile`	whether to log commands to a separate logfile in the `outputDirectory`. Logging will otherwise be in the function-specific logging namespace of logger and will be put wherever this namespace is defined to place its output. See logger::log_appender for details.
`verbose`	display verbose information about processing steps taken, as well as progress bars.

Details

Input signals not in a natively supported file format will be converted before the autocorrelation functions are computed. The conversion process will display warnings about input files that are not in known losslessly encoded formats.

Default output is in SSFF binary format, with tracks containing the estimated mid formant frequency of each formant (track FHz, one column per formant) and the associated formant bandwidth (track BHz, one column per formant). If toFile is TRUE, the results will be written to a file with the same name as the input file, but with an extension .fms.

The function is a re-write of the wrassp::forest function, but with media pre-conversion, better checking of preconditions such as the input file existance, structured logging, and the use of a more modern framework for user feedback.

The native file type of this function is "wav" files (in "pcm_s16le" format), SUNs "au", NIST, or CSL formats (kay or NSP extension). Input signal conversion, when needed, is done by libavcodec and the excellent av wrapper package.

Value

If toFile is FALSE, the function returns a list of AsspDataObj objects. If toFile is TRUE, the number (integer) of successfully processed and stored output files is returned.

Note

This function is not considered computationally expensive enough to require caching of results if applied to many signals. However, if the number of signals it will be applied to is very long, then caching of results may be warranted.

Author(s)

Raphael Winkelmann

Lasse Bombien

Fredrik Nylén

Examples

# get path to audio file
path2wav <- list.files(system.file("samples","sustained", package = "superassp"), pattern = glob2rx("a1.wav"), full.names = TRUE)


# calculate formant values
res <- forest(path2wav, toFile=FALSE)

# plot formant values
matplot(seq(0,numRecs.AsspDataObj(res) - 1) / rate.AsspDataObj(res) + 
          attr(res, 'startTime'), 
        res[["F[Hz]"]], 
        type='l', 
        xlab='time (s)', 
        ylab='Formant frequency (Hz)')

humlab-speech/superassp documentation built on June 10, 2025, 3:02 p.m.