kaldi_pitch: Estimate pitch using the Kaldi modifies version of RAPT
In humlab-speech/superassp: A speech signal processing using various framworks using a wrassp-like interface

kaldi_pitch

R Documentation

Estimate pitch using the Kaldi modifies version of RAPT

Description

The algorithm used is a version of the RAPT algorithm that considers voicing also in voiceless frames and conputes a Normalized Cross Correlation Function (NCCF) that can be used to estimate the probability of voicing \insertCiteGhahremani.2014.10.1109/icassp.2014.6854049superassp.

Usage

kaldi_pitch(
  listOfFiles,
  beginTime = 0,
  endTime = 0,
  windowShift = 5,
  windowSize = 25,
  minF = 70,
  maxF = 200,
  softMinF0 = 10,
  voiced_voiceless_cost = 0.1,
  owpass_cutoff = 1000,
  resample_frequency = 4000,
  deltaChange = 0.005,
  nccfBallast = 7000,
  lowpass_cutoff = 1000,
  lowpass_filter_width = 1,
  upsample_filter_width = 5,
  max_frames_latency = 0,
  frames_per_chunk = 0,
  simulate_first_pass_online = FALSE,
  recompute_frame = 500,
  snip_edges = TRUE,
  explicitExt = "kap",
  outputDirectory = NULL,
  toFile = TRUE,
  conda.env = NULL
)

Arguments

`listOfFiles`	A vector of file paths to wav files.
`beginTime`	The start time of the section of the sound file that should be processed.
`endTime`	The end time of the section of the sound file that should be processed.
`windowShift`	The measurement interval (frame duration), in seconds.
`minF`	Candidate f0 frequencies below this frequency will not be considered.
`maxF`	Candidates above this frequency will be ignored.
`resample_frequency`	Frequency that we down-sample the signal to. Must be more than twice `lowpass_cutoff`. (default: 4000)
`lowpass_cutoff`	Cutoff frequency for LowPass filter (Hz) (default: 1000)
`lowpass_filter_width`	Integer that determines filter width of lowpass filter, more gives sharper filter. (default: 1)
`max_frames_latency`	Maximum number of frames of latency that we allow pitch tracking to introduce into the feature processing (affects output only if `frames_per_chunk` > 0 and `simulate_first_pass_online`=`TRUE`) (default: 0)
`frames_per_chunk`	The number of frames used for energy normalization. (default: 0)
`simulate_first_pass_online`	If true, the function will output features that correspond to what an online decoder would see in the first pass of decoding – not the final version of the features, which is the default. (default: `FALSE`) Relevant if `frames_per_chunk > 0`.
`recompute_frame`	Only relevant for compatibility with online pitch extraction. A non-critical parameter; the frame at which we recompute some of the forward pointers, after revising our estimate of the signal energy. Relevant if `frames_per_chunk > 0`. (default: 500)
`snip_edges`	If this is set to false, the incomplete frames near the ending edge won’t be snipped, so that the number of frames is the file size divided by the `windowShift`. This makes different types of features give the same number of frames. (default: True)
`explicitExt`	the file extension that should be used.
`outputDirectory`	set an explicit directory for where the signal file will be written. If not defined, the file will be written to the same directory as the sound file.
`toFile`	write the output to a file? The file will be written in `outputDirectory`, if defined, or in the same directory as the soundfile.
`conda.env`	The name of the conda environment in which Python and its required packages are stored. Please make sure that you know what you are doing if you change this.
`soft_min_f0`	(float, optional) – Minimum f0, applied in soft way, must not exceed min-f0 (default: 10.0)
`penalty_factor`	Cost factor for fO change. (default: 0.1)
`delta_pitch`	Smallest relative change in pitch that our algorithm measures. (default: 0.005)
`nccf_ballast`	Increasing this factor reduces NCCF for quiet frames (default: 7000)
`psample_filter_width`	Integer that determines filter width when upsampling NCCF. (default: 5)

Details

The function calls the torchaudio \insertCiteyang2021torchaudiosuperassp library to do the pitch estimates and therefore relies on it being present in a properly set up python environment to work. Please refer to the torchaudio manual for further information.

Value

An SSFF track object containing two tracks (f0 and nccf) that are either returned (toFile == FALSE) or stored on disk.

References

\insertAllCited

humlab-speech/superassp
A speech signal processing using various framworks using a wrassp-like interface

kaldi_pitch: Estimate pitch using the Kaldi modifies version of RAPT
In humlab-speech/superassp: A speech signal processing using various framworks using a wrassp-like interface

Estimate pitch using the Kaldi modifies version of RAPT

Description

Usage

Arguments

Details

Value

References

See Also

Related to kaldi_pitch in humlab-speech/superassp...

R Package Documentation

Browse R Packages

We want your feedback!

humlab-speech/superassp A speech signal processing using various framworks using a wrassp-like interface

kaldi_pitch: Estimate pitch using the Kaldi modifies version of RAPT In humlab-speech/superassp: A speech signal processing using various framworks using a wrassp-like interface

Estimate pitch using the Kaldi modifies version of RAPT

Description

Usage

Arguments

Details

Value

References

See Also

Related to kaldi_pitch in humlab-speech/superassp...

R Package Documentation

Browse R Packages

We want your feedback!

humlab-speech/superassp
A speech signal processing using various framworks using a wrassp-like interface

kaldi_pitch: Estimate pitch using the Kaldi modifies version of RAPT
In humlab-speech/superassp: A speech signal processing using various framworks using a wrassp-like interface