importTexts: Import a collection of texts into R

Description Usage Arguments Value What it does See Also Examples

View source: R/importTexts.R

Description

Imports a collection of documents into R and performs basic text processing

Usage

1
importTexts(dl, normalize = TRUE)

Arguments

dl

The docList object that contains the index with the paths to the files for each text.

normalize

A logical condition. If "TRUE", text will be converterd to all lower case and stopwords will be removed. Also, all instances of the long-S will be converted to s, all numeric characters will be removed, vv will be converted to w, and 'd and 'ring will be converted to 'ed' and 'ering' respectively, and all special characters will be removed.

Value

dl The docList object that contains the texts of the corpus, the path to the indexFile and the original directory that the docList object was built from.

What it does

This function collects, cleans up, and stores the text of the collection's documents in a single object. Essentially, it runs the cleanup function over a folder of documents. The texts are held as vectors in a single list, labeled by the corresponding id in the index file. The id is usually the filename or the Text Creation Partnership number.

See Also

cleanup

Examples

1

michaelgavin/tei2r documentation built on May 22, 2019, 9:50 p.m.