cleanTextData: Clean and tokenize string data

Description Usage Arguments Details Value See Also

Description

This function applies different cleaning techniques to clean corpus data.

Usage

1

Arguments

data

Data read by readTextFile

Details

This function removes non english characters, numbers, white spaces, brackets, punctuation. It also handles cases like abbreviation, contraction. It converts entire text to lower case.

Value

a list having sampled text data

See Also

tm_map iconv content_transformer removeNumbers replace_contraction replace_abbreviation bracketX removePunctuation tolower stripWhitespace


achalshah20/AchalNLP documentation built on May 10, 2019, 5:10 a.m.