View source: R/txt.to.features.R
txt.to.features | R Documentation |
Function that converts a vector of words into either words, or characters, and optionally parses them into n-grams.
txt.to.features(tokenized.text, features = "w", ngram.size = 1)
tokenized.text |
a vector of tokinzed words |
features |
an option for specifying the desired type of feature:
|
ngram.size |
an optional argument (integer) indicating the value of n, or the size of n-grams to be created. If this argument is missing, the default value of 1 is used. |
Function that carries out the preprocessing steps necessary for
feature selection: converts an input text into the type of sequences
needed (n-grams etc.) and returns a new vector of items. The function
invokes make.ngrams
to combine single units into pairs,
triplets or longer n-grams. See help(make.ngrams)
for details.
Maciej Eder, Mike Kestemont
txt.to.words
, txt.to.words.ext
,
make.ngrams
# consider the string my.text:
my.text = "Quousque tandem abutere, Catilina, patientia nostra?"
# split it into a vector of consecutive words:
my.vector.of.words = txt.to.words(my.text)
# build a vector of word 2-grams:
txt.to.features(my.vector.of.words, ngram.size = 2)
# or produce character n-grams (in this case, character tetragrams):
txt.to.features(my.vector.of.words, features = "c", ngram.size = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.