yaapt | R Documentation |
The Yet Another Algorithm for Pitch Tracking algorithm \insertCiteKasi.2002.10.1109/icassp.2002.5743729superassp that computes f0 using Normalized Cross Correlation (NCCF) and the work of Talkin \insertCitetalkin1995robustsuperassp in developing the RAPT algorithm.
yaapt(
listOfFiles,
beginTime = 0,
endTime = 0,
windowShift = 5,
windowSize = 35,
minF = 70,
maxF = 200,
tda_frame_length = 35,
fft_length = 8192,
bp_forder = 150,
bp_low = 50,
bp_high = 1500,
nlfer_thresh1 = 0.75,
nlfer_thresh2 = 0.1,
shc_numharms = 3,
shc_window = 40,
shc_maxpeaks = 4,
shc_pwidth = 50,
shc_thresh1 = 5,
shc_thresh2 = 1.25,
f0_double = 150,
f0_half = 150,
dp5_k1 = 11,
dec_factor = 1,
nccf_thresh1 = 0.3,
nccf_thresh2 = 0.9,
nccf_maxcands = 3,
nccf_pwidth = 5,
merit_boost = 0.2,
merit_pivot = 0.99,
merit_extra = 0.4,
median_value = 7,
dp_w1 = 0.15,
dp_w2 = 0.5,
dp_w3 = 0.1,
dp_w4 = 0.9,
explicitExt = "yf0",
outputDirectory = NULL,
toFile = TRUE
)
listOfFiles |
A vector of file paths to wav files. |
beginTime |
The start time of the section of the sound file that should be processed. |
endTime |
The end time of the section of the sound file that should be processed. |
windowShift |
The measurement interval (frame duration), in seconds. |
windowSize |
length of each analysis frame (default: 35 ms) |
minF |
Candidate f0 frequencies below this frequency will not be considered. |
maxF |
Candidates above this frequency will be ignored. |
tda_frame_length |
The frame length employed in the time domain analysis (defaults to the same as windowSize 35 ms). |
fft_length |
FFT length (default: 8192 samples) |
bp_forder |
order of band-pass filter (default: 150) |
bp_low |
low frequency of filter passband (default: 50 Hz) |
bp_high |
high frequency of filter passband (default: 1500 Hz) |
nlfer_thresh1 |
NLFER (Normalized Low Frequency Energy Ratio) boundary for voiced/unvoiced decisions (default: 0.75) |
nlfer_thresh2 |
threshold for NLFER definitely unvoiced (default: 0.1) |
shc_numharms |
number of harmonics in SHC (Spectral Harmonics Correlation) calculation (default: 3) |
shc_window |
SHC window length (default: 40 Hz) |
shc_maxpeaks |
maximum number of SHC peaks to be found (default: 4) |
shc_pwidth |
window width in SHC peak picking (default: 50 Hz) |
shc_thresh1 |
threshold 1 for SHC peak picking (default: 5) |
shc_thresh2 |
threshold 2 for SHC peak picking (default: 1.25) |
f0_double |
pitch doubling decision threshold (default: 150 Hz) |
f0_half |
pitch halving decision threshold (default: 150 Hz) |
dp5_k1 |
weight used in dynamic program (default: 11) |
dec_factor |
factor for signal resampling (default: 1) |
nccf_thresh1 |
threshold for considering a peak in NCCF (Normalized Cross Correlation Function) (default: 0.3) |
nccf_thresh2 |
threshold for terminating search in NCCF (default: 0.9) |
nccf_maxcands |
maximum number of candidates found (default: 3) |
nccf_pwidth |
window width in NCCF peak picking (default: 5) |
merit_boost |
boost merit (default. 0.20) |
merit_pivot |
merit assigned to unvoiced candidates in definitely unvoiced frames (default: 0.99) |
merit_extra |
merit assigned to extra candidates in reducing pitch doubling/halving errors (default: 0.4) |
median_value |
order of medial filter (default: 7) |
dp_w1 |
DP (Dynamic Programming) weight factor for voiced-voiced transitions (default: 0.15) |
dp_w2 |
DP weight factor for voiced-unvoiced or unvoiced-voiced transitions (default: 0.5) |
dp_w3 |
DP weight factor of unvoiced-unvoiced transitions (default: 0.1) |
dp_w4 |
Weight factor for local costs (default: 0.9) |
explicitExt |
the file extension that should be used. |
outputDirectory |
set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file. |
toFile |
write the output to a file? The file will be written in |
The YAAPT algorithm processes the original acoustic signal and a non-linearly processed version of the signal to partially restore very weak f0 components. Intelligent peak picking to select multiple f0 candidates and assign merit factors; and, incorporation of highly robust pitch contours obtained from smoothed versions of low frequency portions of spectrograms. Dynamic programming is used to find the “best” pitch track among all the candidates, using both local and transition costs.
An SSFF track object containing two tracks ("f0" and "voiced") which contains the computed pitch values, and a binary (0 or 1) indication of whether the frame was considered "voiced" (1) or not (0). The tracks are either returned (toFile == FALSE) or stored on disk.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.