asSTMCorpus | R Documentation |
Convert a set of document term counts and associated metadata to
the form required for processing by the stm
function.
asSTMCorpus(documents, vocab, data = NULL, ...)
documents |
A documents-by-term matrix of counts, or a set of
counts in the format returned by |
vocab |
Character vector specifying the words in the corpus in the
order of the vocab indices in documents. Each term in the vocabulary index
must appear at least once in the documents. See |
data |
An optional data frame containing the prevalence and/or content covariates. If unspecified the variables are taken from the active environment. |
... |
Additional arguments passed to or from other methods. |
A list with components "documents"
, "vocab"
, and
"data"
in the form needed for further processing by the stm
function.
prepDocuments
, stm
library(quanteda)
gadarian_corpus <- corpus(gadarian, text_field = "open.ended.response")
gadarian_dfm <- dfm(gadarian_corpus,
remove = stopwords("english"),
stem = TRUE)
asSTMCorpus(gadarian_dfm)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.