get_vihi_annotations: Load VIHI annotation data
In BergelsonLab/blabr: a toolbox for working in the BLAB

get_vihi_annotations

R Documentation

Load VIHI annotation data

Description

Clone BLAB-private vihi_annotations repo to ⁠~/BLAB_DATA⁠ once before using this function.

Usage

get_vihi_annotations(
  version = NULL,
  subset = c("random", "everything", "VI+TD-VI"),
  table = c("annotations", "intervals", "merged", "all"),
  include_all_tier_types = FALSE,
  allow_annotation_errors = FALSE,
  include_pi = FALSE
)

Arguments

`version`	version tag to checkout
`subset`	Which pre-defined subset of the data should be loaded? 'random' (the default) loads the annotations from the 15 randomly sampled intervals from all recordings in the corpus. 'VI+TD-VI' loads the annotations from the random and the top-5 high-volubility intervals from VI recordings and their TD matches. 'everything' loads all annotations from all tiers. Exercise caution with this option: the data will include incomplete and unchecked annotations.
`table`	Which table to return - `annotations` (the default) or `intervals`. If `merged`, returns the `annotations` table with the interval information merged in. Intervals without annotations won't be included. If `all`, returns a named list of both tables.#'
`include_all_tier_types`	Should all tier types be included in the output? If `FALSE` (the default), only tiers that are relevant to the subset are returned. For the 'random' and 'VI+TD-VI' subsets, the relevant tier types are: transcription, vcm, lex, mwu, xds. For the 'everything' subset, this parameter is ignored as all tier types are returned.
`allow_annotation_errors`	In case errors are found in the annotations, should the function throw an error (`FALSE`, the default) or add `error_n` columns to the `annotations` table? Use only as a way to inspect the errors, not as a way to ignore them.
`include_pi`	Should annotations marked as PI be included in the output? If `FALSE` (the default), they are filtered out.

Details

The speaker TIER is identified by the participant column. Other tiers are in columns.

Notes:

Annotation are checked for errors for the standard ACLEW tiers only. Interval-level checks aren't currently checked at all.
Annotations marked as PI are included. Filter them out if you don't want them.
The transcribed utterance can be empty (”). Normally, that means that a code interval has been segmented but not annotated. But there might be other stray utterance segments like that.
(relevant for non-speaker TIERs only) Currently, there is no way to tell whether an annotation is missing because it was not segmented or because it was segmented but not yet annotated: both are represented as NA. This will change in the future: missing segment will still be NA, but missing annotation will be ”.

Value

A table or a list of tables depending on the table parameter.

Examples

vitd_annotations <- get_vihi_annotations(version='0.0.0.9006-dev.5',
                                         subset='VI+TD-VI')

vitd <- get_vihi_annotations(version='0.0.0.9006-dev.5', subset='VI+TD-VI',
                             table='all')
vitd$annotations %>% head()
vitd$intervals %>% head()

BergelsonLab/blabr documentation built on Dec. 22, 2024, 9:32 p.m.