functional_vad | R Documentation |
Voice Activity Detector. Similar to SoX implementation. Attempts to trim silence and quiet background sounds from the ends of recordings of speech. The algorithm currently uses a simple cepstral power measurement to detect voice, so may be fooled by other things, especially music.
functional_vad( waveform, sample_rate, trigger_level = 7, trigger_time = 0.25, search_time = 1, allowed_gap = 0.25, pre_trigger_time = 0, boot_time = 0.35, noise_up_time = 0.1, noise_down_time = 0.01, noise_reduction_amount = 1.35, measure_freq = 20, measure_duration = NULL, measure_smooth_time = 0.4, hp_filter_freq = 50, lp_filter_freq = 6000, hp_lifter_freq = 150, lp_lifter_freq = 2000 )
waveform |
(Tensor): Tensor of audio of dimension |
sample_rate |
(int): Sample rate of audio signal. |
trigger_level |
(float, optional): The measurement level used to trigger activity detection. This may need to be cahnged depending on the noise level, signal level, and other characteristics of the input audio. (Default: 7.0) |
trigger_time |
(float, optional): The time constant (in seconds) used to help ignore short bursts of sound. (Default: 0.25) |
search_time |
(float, optional): The amount of audio (in seconds) to search for quieter/shorter bursts of audio to include prior to the detected trigger point. (Default: 1.0) |
allowed_gap |
(float, optional): The allowed gap (in seconds) between quiteter/shorter bursts of audio to include prior to the detected trigger point. (Default: 0.25) |
pre_trigger_time |
(float, optional): The amount of audio (in seconds) to preserve before the trigger point and any found quieter/shorter bursts. (Default: 0.0) |
boot_time |
(float, optional) The algorithm (internally) uses adaptive noise estimation/reduction in order to detect the start of the wanted audio. This option sets the time for the initial noise estimate. (Default: 0.35) |
noise_up_time |
(float, optional) Time constant used by the adaptive noise estimator for when the noise level is increasing. (Default: 0.1) |
noise_down_time |
(float, optional) Time constant used by the adaptive noise estimator for when the noise level is decreasing. (Default: 0.01) |
noise_reduction_amount |
(float, optional) Amount of noise reduction to use in the detection algorithm (e.g. 0, 0.5, ...). (Default: 1.35) |
measure_freq |
(float, optional) Frequency of the algorithm’s processing/measurements. (Default: 20.0) |
measure_duration |
(float, optional) Measurement duration. (Default: Twice the measurement period; i.e. with overlap.) |
measure_smooth_time |
(float, optional) Time constant used to smooth spectral measurements. (Default: 0.4) |
hp_filter_freq |
(float, optional) "Brick-wall" frequency of high-pass filter applied at the input to the detector algorithm. (Default: 50.0) |
lp_filter_freq |
(float, optional) "Brick-wall" frequency of low-pass filter applied at the input to the detector algorithm. (Default: 6000.0) |
hp_lifter_freq |
(float, optional) "Brick-wall" frequency of high-pass lifter used in the detector algorithm. (Default: 150.0) |
lp_lifter_freq |
(float, optional) "Brick-wall" frequency of low-pass lifter used in the detector algorithm. (Default: 2000.0) |
The effect can trim only from the front of the audio, so in order to trim from the back, the reverse effect must also be used.
Tensor: Tensor of audio of dimension (..., time).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.