FeatureExtraction: Feature extraction.
In M3SOulu/NLoN: Natural Language or Not

FeatureExtraction

R Documentation

Feature extraction.

Computes a set of simple text-based features.

FeatureExtraction(text)

text

The text.

The features computed are the followings:

ratio.caps: The ratio of uppercase letters.
ratio.specials: The ratio of special characters.
ratio.numbers: The ratio of number characters.
length.words: The average word length.
stopwords: The ratio of English stopwords (using first tokenizer).
stopwords2: The ratio of English stopwords (using second tokenizer).
last.char.nl: Boolean for the use of NL character at the end of the text.
last.char.code: Boolean for the use of code character at the end of text.
first.3.chars.letters: Number of letters in the three first characters.
emoticons: Number of emoticons
first.char.at: Boolean for the use of @ character at the beginning of the line.