corpus_import: Import annotation files into corpus object

View source: R/corpus_import.R

corpus_importR Documentation

Import annotation files into corpus object

Description

Scans all path specified in if x@paths.annotation.files for annotation files. Supported file formats will be loaded as transcript objects into the corpus object. All previously loaded transcript objects will be deleted.

Usage

corpus_import(x, createFullText = TRUE, assignMedia = TRUE)

Arguments

x

Corpus object.

createFullText

Logical; if TRUE full text will be created.

assignMedia

Logical; if TRUE the folder(s) specified in @paths.media.files of your corpus object will be scanned for media.

Details

If assignMedia=TRUE the paths defined in x@paths.media.files will be scanned for media files. Based on their file names the media files and annotations files will be matched. Only the the file types set in options()$act.fileformats.audio and options()$act.fileformats.video will be recognized. You can modify these options to recognize other media types.

See @import.results of the corpus object to check the results of importing the files. To get a detailed overview of the corpus object use act::info(x), for a summary use act::info_summarized(x).

Value

Corpus object.

See Also

corpus_new, examplecorpus

Examples

library(act)

# The example files that come with the act library are located here:
path <- system.file("extdata", "examplecorpus", package="act")

# This is the examplecorpus object that comes with the library
examplecorpus

# Make sure that the input folder of the example corpus object is set correctly
examplecorpus@paths.annotation.files <- path
examplecorpus@paths.media.files <- path

# Load annotation files into the corpus object (again)
examplecorpus <- act::corpus_import(x=examplecorpus)

# Creating the full texts may take a long time.
# If you do NOT want to create the full texts immediately use the following command:
examplecorpus <- act::corpus_import(x=examplecorpus, createFullText=FALSE )

act documentation built on June 7, 2023, 6:16 p.m.