annotate_voiceactivity: Perform voice activity detection across a database
In humlab-speech/reindeer: This package extends the capabilities of emuR to work with speech data in a nordic climate

annotate_voiceactivity

R Documentation

Perform voice activity detection across a database

Description

Voice activity detection is applied to a database to find portions of speech signals where spoken communication is likely to have occured, to make transcription work more efficient in databases with silences or many small utterances that should be disrecarded. The segmentation is intended to be used for indexing and easy navigation of the database only, and should not inserted into a hierarchy of levels. The intended use of this function is instead to supply the result of a "VAD == SPEECH" query call to a serve or reindeer:write_bundleList so that the annotations can be used for efficient navigation of a database. If not helpful in the recording settings used, the user can rerun this function and with more applicable thresholds, overwriting previously generated labels.

Usage

annotate_voiceactivity(
  emuDBhandle,
  auth_key,
  levelname = "VAD",
  speech_probability_threshold = 0.6,
  nospeech_probability_threshold = 0.4,
  minimum_speech_duration = 0.2,
  minimum_nonspeech_duration = 0.1
)

Arguments

`emuDBhandle`	An emuR database handle.
`auth_key`	A Hugging Face 'User Access Token' for a user which has activated access to the pyannote/segmentation model.
`levelname`	The name of fhe segmentation level (and attribute) to create to hold the annotations of speech.
`speech_probability_threshold`	The probability threshold above which the model will percieve the signal to contain speech.
`nospeech_probability_threshold`	The probability threshold below which the model will percieve the signal to contain non-speech.
`minimum_speech_duration`	The minimum duration of a section of speech to consider (in seconds).
`minimum_nonspeech_duration`	The minimum duration of a portion that could be non-speech (in seconds).

Details

Sections thought to contain speech will be marked in the levelname level by a SEGMENT with the label SPEECH. The levelname level will be cleared before inserting labels if this function is applied again to the database. The speech segmentation model of the pyannote-audio framework is used in speech segementation \insertCiteBredin.2019,Bredin.2021reindeer