cleanDocument: cleanDocument

Description Usage Arguments Details Value Author(s) See Also

Description

cleanDocument cleans the HC Corpus document

Usage

1
cleanDocument(rawDocument)

Arguments

rawDocument

- the meta data and content for the file to be analyzed

Details

This function reads a corpus document and performs a series of conditioning, reshaping, normalization, and clean up tasks. The conditioning tasks include:

The reshaping task uses the quanteda package https://cran.r-project.org/web/packages/quanteda/quanteda.pdf to reshape the corpus documents into sentences

The normalization tasks include the following:

Finally, the clean up tasks include:

Value

cleanDocument Cleaned text document in unlisted vector format.

Author(s)

John James, j2sdatalab@gmail.com

See Also

Other text processing functions: analyzeCorpus, cleanCorpus, extractLines, getCorpus, getStats, summarizeAnalysis


j2scode/predictifyR documentation built on May 14, 2019, 10:34 a.m.