annotate_voiceactivity: Perform voice activity detection across a database

View source: R/reindeeR_annotate.R

annotate_voiceactivityR Documentation

Perform voice activity detection across a database

Description

Voice activity detection is applied to a database to find portions of speech signals where spoken communication is likely to have occured, to make transcription work more efficient in databases with silences or many small utterances that should be disrecarded. The segmentation is intended to be used for indexing and easy navigation of the database only, and should not inserted into a hierarchy of levels. The intended use of this function is instead to supply the result of a "VAD == SPEECH" query call to a serve or reindeer:write_bundleList so that the annotations can be used for efficient navigation of a database. If not helpful in the recording settings used, the user can rerun this function and with more applicable thresholds, overwriting previously generated labels.

Usage

annotate_voiceactivity(
  emuDBhandle,
  auth_key,
  levelname = "VAD",
  speech_probability_threshold = 0.6,
  nospeech_probability_threshold = 0.4,
  minimum_speech_duration = 0.2,
  minimum_nonspeech_duration = 0.1
)

Arguments

emuDBhandle

An emuR database handle.

auth_key

A Hugging Face 'User Access Token' for a user which has activated access to the pyannote/segmentation model.

levelname

The name of fhe segmentation level (and attribute) to create to hold the annotations of speech.

speech_probability_threshold

The probability threshold above which the model will percieve the signal to contain speech.

nospeech_probability_threshold

The probability threshold below which the model will percieve the signal to contain non-speech.

minimum_speech_duration

The minimum duration of a section of speech to consider (in seconds).

minimum_nonspeech_duration

The minimum duration of a portion that could be non-speech (in seconds).

Details

Sections thought to contain speech will be marked in the levelname level by a SEGMENT with the label SPEECH. The levelname level will be cleared before inserting labels if this function is applied again to the database. The speech segmentation model of the pyannote-audio framework is used in speech segementation \insertCiteBredin.2019,Bredin.2021reindeer

Value

A tibble

References

\insertAllCited

humlab-speech/reindeer documentation built on May 21, 2023, 4:43 p.m.