View source: R/sparse_to_stm.R
sparse_to_stm | R Documentation |
stm has a readCorpus function that does the same, however, it may choke on large matrices. Hence, this function is simply a more memory efficent version
for sparseMatrix input using text2vec::as.lda_c
for conversion with slight adaptions to make output fit to stm requirements in terms of document indices.
sparse_to_stm(x, keep_rownames = TRUE)
x |
A |
keep_rownames |
By default TRUE, documents are named according to the rownames of |
A list y of 2 items, y$documents are documents represented similar to lda_c format, but vocabulary indices start with 1 instead of 0)
and y$vocab containing the vocabulary (i.e. orignal colnames of x
).
library(text2vec) library(stm) data("movie_review") it = itoken(substr(movie_review$review[1:3], 1, 50), preprocess_function = tolower, tokenizer = word_tokenizer) v = create_vocabulary(it) vectorizer = vocab_vectorizer(v) it = itoken(movie_review$review[1:3], preprocess_function = tolower, tokenizer = word_tokenizer) dtm = create_dtm(it, vectorizer) all.equal(textility::sparse_to_stm(dtm), stm::readCorpus(dtm)) #[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.