corpus_import: Import annotation files into corpus object
In oliverehmer/act: Aligned Corpus Toolkit

corpus_import

R Documentation

Import annotation files into corpus object

Description

Scans all path specified in if x@paths.annotation.files for annotation files. Supported file formats will be loaded as transcript objects into the corpus object. All previously loaded transcript objects will be deleted.

Usage

corpus_import(x, createFullText = TRUE, assignMedia = TRUE)

Arguments

`x`	Corpus object.
`createFullText`	Logical; if `TRUE` full text will be created.
`assignMedia`	Logical; if `TRUE` the folder(s) specified in `@paths.media.files` of your corpus object will be scanned for media.

Details

If assignMedia=TRUE the paths defined in x@paths.media.files will be scanned for media files. Based on their file names the media files and annotations files will be matched. Only the the file types set in options()$act.fileformats.audio and options()$act.fileformats.video will be recognized. You can modify these options to recognize other media types.

See @import.results of the corpus object to check the results of importing the files. To get a detailed overview of the corpus object use act::info(x), for a summary use act::info_summarized(x).

Value

Corpus object.

Examples

library(act)

# The example files that come with the act library are located here:
path <- system.file("extdata", "examplecorpus", package="act")

# This is the examplecorpus object that comes with the library
examplecorpus

# Make sure that the input folder of the example corpus object is set correctly
examplecorpus@paths.annotation.files <- path
examplecorpus@paths.media.files <- path

# Load annotation files into the corpus object (again)
examplecorpus <- act::corpus_import(x=examplecorpus)

# Creating the full texts may take a long time.
# If you do NOT want to create the full texts immediately use the following command:
examplecorpus <- act::corpus_import(x=examplecorpus, createFullText=FALSE )

oliverehmer/act documentation built on March 11, 2023, 1:30 p.m.