| atom_dtm | R Documentation |
atom_dtm take a corpora, tokenized or not, and create the
corresponding DocumentTermMatrix (DTM) stored as sparse
simple_triplet_matrix (see Details).
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
ncores = parallel::detectCores() - 1)
## S3 method for class 'list'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
ncores = parallel::detectCores() - 1)
## S3 method for class 'VCorpus'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
ncores = parallel::detectCores() - 1)
## S3 method for class 'character'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
ncores = parallel::detectCores() - 1, docs_or_tokens = c("docs",
"tokens"))
## Default S3 method:
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
ncores = parallel::detectCores() - 1)
corpus |
(list) of documents, or a list of character vectors, each element reporting tokens from a document |
step |
(num) integer value (default is 500L) used to broken the
procedure in parts of at maximum |
parallel |
(lgl) if |
... |
further option passed to the function |
ncores |
(int) number of core to use in the parallel computation (default is number of machine cores minus one) |
docs_or_tokens |
(chr) if |
The algrithm of the simple triplet matrix considers three indeces i,
j, v, in which the indeces i, j represent
respectively the row (document) and the column (term/token) coordinate of an
entry v rapresent its weight (commonly the frequency).
Moreover, for compatibility reasons (with some machine learning R
implementation of algorithms which use different convention for the
representation of sparse matrices), the indeces are ordered with priority
i, j.
a multiclass DocumentTermMatrix and
simple_triplet_matrix object weigthed with simple
term frequencies, rappresenting a document-term matrix in which each
row represent a document, each columns a term (or token) and the
content the simple frequencies of the terms in the document.
data(liu_4h28)
corpus <- data2corpus(liu_4h28)
atom_dtm(corpus)
atom_dtm(c('one', 'two', 'one two')) # three documents, two token
atom_dtm(c('one', 'two', 'one two'), docs_or_tokens = 'tokens') # one docs
## Not run:
atom_dtm(corpus, parallel = TRUE) # parallel computation
atom_dtm(c(1, 2, 3)) # error
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.