Description Usage Arguments Value
Turns text into data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
texts |
a character vector of texts. |
sparse |
maximum feature sparsity for inclusion (1 = include all features) |
wstem |
character what words should be stemmed? |
ngrams |
numeric vector of ngram sizes (max = 1:3) |
language |
character what language are you parsing? |
vocabmatch |
matrix used to create a new matrix with features that are identical to a previous one |
stop.words |
logical should stop words be included? default is TRUE |
punct |
logical should exclamation points and question marks be included as features? |
POS |
logical should features have part of speech tags appended? default is FALSE |
dependency |
logical should features have dependency relations appended? default is FALSE |
tag.sub |
numeric what fraction of features should be replaced by POS tags? default is 0 (no features), fractions not supported yet. |
overlap |
numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set? |
group.conc |
character group IDs for removing group-specific words |
group.conc.cutoff |
numeric threshold for group-specificity of words, as proportion of occurences in the main group. |
TPformat |
logical - return in stm::textProcessor() format? |
verbose |
logical - report interim steps during processing |
Feature counts, as a matrix (or in stm format)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.