cleanTextData: Clean and tokenize string data

Description Usage Arguments Details Value See Also

View source: R/ANLP.R

Description

This function applies different cleaning techniques to clean corpus data.

Usage

1

Arguments

data

Data read by readTextFile

Details

This function removes non english characters, numbers, white spaces, brackets, punctuation. It also handles cases like abbreviation, contraction. It converts entire text to lower case.

Value

a list having sampled text data

See Also

tm_map iconv content_transformer removeNumbers replace_contraction replace_abbreviation bracketX removePunctuation tolower stripWhitespace


ANLP documentation built on May 30, 2017, 4:42 a.m.