cleanup: Convert a plain-text or TEI document into a character vector

Description Usage Arguments Examples

View source: R/cleanup.R

Description

Convert a plain-text or TEI document into a character vector

Usage

1
cleanup(filepath, stopwords = c(), normalize = TRUE)

Arguments

filepath

A path to the file that will be converted.

normalize

A logical condition. If "TRUE", text will be converterd to all lower case and stopwords will be removed. Also, all instances of '<e2><88><ab>' and '<c5><bf>' will be convertedto 's', all numeric characters will be removed, 'vv' will be converted to 'w', and ”d' and ”ring' will be converted to 'ed' and 'ering' respectively, and all special characters will be removed.

Examples

1
2
locke.path = "~/Desktop/locke2ndTreatise.txt"
cleanup(locke.path, removeCaps = TRUE, removeStopwords = FALSE)

michaelgavin/tei2r documentation built on May 22, 2019, 9:50 p.m.