kaldi_pitch | R Documentation |
The algorithm used is a version of the RAPT algorithm that considers voicing also in voiceless frames and conputes a Normalized Cross Correlation Function (NCCF) that can be used to estimate the probability of voicing \insertCiteGhahremani.2014.10.1109/icassp.2014.6854049superassp.
kaldi_pitch(
listOfFiles,
beginTime = 0,
endTime = 0,
windowShift = 5,
windowSize = 25,
minF = 70,
maxF = 200,
softMinF0 = 10,
voiced_voiceless_cost = 0.1,
owpass_cutoff = 1000,
resample_frequency = 4000,
deltaChange = 0.005,
nccfBallast = 7000,
lowpass_cutoff = 1000,
lowpass_filter_width = 1,
upsample_filter_width = 5,
max_frames_latency = 0,
frames_per_chunk = 0,
simulate_first_pass_online = FALSE,
recompute_frame = 500,
snip_edges = TRUE,
explicitExt = "kap",
outputDirectory = NULL,
toFile = TRUE,
conda.env = NULL
)
listOfFiles |
A vector of file paths to wav files. |
beginTime |
The start time of the section of the sound file that should be processed. |
endTime |
The end time of the section of the sound file that should be processed. |
windowShift |
The measurement interval (frame duration), in seconds. |
minF |
Candidate f0 frequencies below this frequency will not be considered. |
maxF |
Candidates above this frequency will be ignored. |
resample_frequency |
Frequency that we down-sample the signal to. Must be more than twice |
lowpass_cutoff |
Cutoff frequency for LowPass filter (Hz) (default: 1000) |
lowpass_filter_width |
Integer that determines filter width of lowpass filter, more gives sharper filter. (default: 1) |
max_frames_latency |
Maximum number of frames of latency that we allow pitch tracking to introduce into the feature processing (affects output only if |
frames_per_chunk |
The number of frames used for energy normalization. (default: 0) |
simulate_first_pass_online |
If true, the function will output features that correspond to what an online decoder would see in the first pass of decoding – not the final version of the features, which is the default. (default: |
recompute_frame |
Only relevant for compatibility with online pitch extraction. A non-critical parameter; the frame at which we recompute some of the forward pointers, after revising our estimate of the signal energy. Relevant if |
snip_edges |
If this is set to false, the incomplete frames near the ending edge won’t be snipped, so that the number of frames is the file size divided by the |
explicitExt |
the file extension that should be used. |
outputDirectory |
set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file. |
toFile |
write the output to a file? The file will be written in |
conda.env |
The name of the conda environment in which Python and its required packages are stored. Please make sure that you know what you are doing if you change this. |
soft_min_f0 |
(float, optional) – Minimum f0, applied in soft way, must not exceed min-f0 (default: 10.0) |
penalty_factor |
Cost factor for fO change. (default: 0.1) |
delta_pitch |
Smallest relative change in pitch that our algorithm measures. (default: 0.005) |
nccf_ballast |
Increasing this factor reduces NCCF for quiet frames (default: 7000) |
psample_filter_width |
Integer that determines filter width when upsampling NCCF. (default: 5) |
The function calls the torchaudio \insertCiteyang2021torchaudiosuperassp library to do the pitch estimates and therefore relies on it being present in a properly set up python environment to work. Please refer to the torchaudio manual for further information.
An SSFF track object containing two tracks (f0 and nccf) that are either returned (toFile == FALSE) or stored on disk.
rapt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.