pyin: Estimate pitch using the probabilistic YIN algorithm

pyinR Documentation

Estimate pitch using the probabilistic YIN algorithm

Description

The probabilistic YIN algorithm \insertCiteMauch.2014.10.1109/icassp.2014.6853678superassp is an extension of YIN \insertCiteCheveigné.2002.10.1121/1.1458024superassp that considers multiple pitch candidates in a hidden Markov model that is Viterbi-decoded to deduce the final pitch estimate. The function also returns a track encoding whether the track was considered voiced or not, and a track containing the probability of voicing in the analysis frame.

Usage

pyin(
  listOfFiles,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 30,
  minF = 70,
  maxF = 200,
  max_transition_rate = 35.92,
  beta_parameters = c(2, 18),
  center = TRUE,
  boltzmann_parameter = 2,
  resolution = 0.1,
  thresholds = 100,
  switch_probability = 0.01,
  no_trough_probability = 0.01,
  pad_mode = "constant",
  explicitExt = "pyp",
  outputDirectory = NULL,
  toFile = TRUE
)

Arguments

listOfFiles

A vector of file paths to wav files.

beginTime

The start time of the section of the sound file that should be processed.

endTime

The end time of the section of the sound file that should be processed.

windowShift

The measurement interval (frame duration), in seconds.

minF

Candidate f0 frequencies below this frequency will not be considered.

maxF

Candidates above this frequency will be ignored.

max_transition_rate

The maximum pitch transition rate in octaves per second.

beta_parameters

The shape parameters for the beta distribution prior over thresholds.

center

Should analysis windows be centered around the time of the window (TRUE, the default) or should the window be considered to have started at the indicated time point (FALSE).

boltzmann_parameter

The shape parameter for the Boltzmann distribution prior over troughs. Larger values will assign more mass to smaller periods.

resolution

The resolution of the pitch bins. 0.01 corresponds to cents.

thresholds

The number of thresholds for peak estimation.

switch_probability

The probability of switching from voiced to unvoiced or vice versa.

no_trough_probability

The maximum probability to add to global minimum if no trough is below threshold.

pad_mode

The mode in which padding occurs. Ignored if center is not TRUE. Padding occurs in the python library librosa, and the user should therefore consult the manual of the NumPy library function numpy.pad for other options.

explicitExt

the file extension that should be used.

outputDirectory

set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.

toFile

write the output to a file? The file will be written in outputDirectory, if defined, or in the same directory as the soundfile.

Details

This function calls the librosa \insertCitebrian_mcfee_2022_6097378superassp Python library to load the audio data an make pitch related estimates.

Value

An SSFF track object containing two tracks (f0 and pitch) that are either returned (toFile == FALSE) or stored on disk.

References

\insertAllCited

humlab-speech/superassp documentation built on May 8, 2024, 2:27 p.m.