get_vihi_annotations: Load VIHI annotation data

View source: R/get_vihi_annotations.R

get_vihi_annotationsR Documentation

Load VIHI annotation data

Description

Clone BLAB-private vihi_annotations repo to ⁠~/BLAB_DATA⁠ once before using this function.

Usage

get_vihi_annotations(
  version = NULL,
  subset = c("random", "everything", "VI+TD-VI"),
  table = c("annotations", "intervals", "merged", "all"),
  include_all_tier_types = FALSE,
  allow_annotation_errors = FALSE,
  include_pi = FALSE
)

Arguments

version

version tag to checkout

subset

Which pre-defined subset of the data should be loaded?

  • 'random' (the default) loads the annotations from the 15 randomly sampled intervals from all recordings in the corpus.

  • 'VI+TD-VI' loads the annotations from the random and the top-5 high-volubility intervals from VI recordings and their TD matches.

  • 'everything' loads all annotations from all tiers. Exercise caution with this option: the data will include incomplete and unchecked annotations.

table

Which table to return - annotations (the default) or intervals. If merged, returns the annotations table with the interval information merged in. Intervals without annotations won't be included. If all, returns a named list of both tables.#'

include_all_tier_types

Should all tier types be included in the output? If FALSE (the default), only tiers that are relevant to the subset are returned. For the 'random' and 'VI+TD-VI' subsets, the relevant tier types are: transcription, vcm, lex, mwu, xds. For the 'everything' subset, this parameter is ignored as all tier types are returned.

allow_annotation_errors

In case errors are found in the annotations, should the function throw an error (FALSE, the default) or add error_n columns to the annotations table? Use only as a way to inspect the errors, not as a way to ignore them.

include_pi

Should annotations marked as PI be included in the output? If FALSE (the default), they are filtered out.

Details

The speaker TIER is identified by the participant column. Other tiers are in columns.

Notes:

  • Annotation are checked for errors for the standard ACLEW tiers only. Interval-level checks aren't currently checked at all.

  • Annotations marked as PI are included. Filter them out if you don't want them.

  • The transcribed utterance can be empty (”). Normally, that means that a code interval has been segmented but not annotated. But there might be other stray utterance segments like that.

  • (relevant for non-speaker TIERs only) Currently, there is no way to tell whether an annotation is missing because it was not segmented or because it was segmented but not yet annotated: both are represented as NA. This will change in the future: missing segment will still be NA, but missing annotation will be ”.

Value

A table or a list of tables depending on the table parameter.

Examples

vihi_annotaitons <- get_vihi_annotations(version='0.0.0.9006-dev.2')

vitd_annotations <- get_vihi_annotations(version='0.0.0.9006-dev.2',
                                         subset='VI+TD-VI')

vitd <- get_vihi_annotations(version='0.0.0.9006-dev.2', subset='VI+TD-VI',
                             table='all')
vitd$annotations %>% head()
vitd$intervals %>% head()


BergelsonLab/blabr documentation built on April 19, 2024, 7:21 p.m.