Description Usage Arguments Details Value
terms_dfm
takes a text source with text objects associated with unique
document identifiers and creates a document-feature-matrix, which can be used
as input for an stm
topic modeller.
1 2 3 |
textData |
a dataframe containing the text to be processed, with each row representing a distinct document |
textColumn |
the column name in |
documentIdColumn |
the column name in |
removeStopwords |
a Boolean indicating whether standard stopwords (see
|
removeNumbers |
a Boolean indicating whether numbers should be removed
from the result; default is FALSE. If TRUE, a the Porter
stemmer from the |
wordStemming |
a Boolean indicating whether words in the text should be reduced to the word stem; default is FALSE. |
customStopwords |
a character vector specifying additional stopwords that should be removed from the result |
Text input (textColumn
) is split with a word tokenizer and
tokens are further processed and filtered according to the function's
options. Since the result is primarily intended as input for a topic
modeller, stopwords (see tidytext
) are
not removed by default.
a document-feature-matrix of type
quanteda::dfm
(similar to a
document-term-matrix), where a document is identified by the value
in the documentIdColumn
specified in the text source (i.e.
textData
), and a feature or term is a character
sequence obtained after tokenization and all other NLP processing options
have been applied to the text associated with a document.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.