| atom_dtm | R Documentation | 
atom_dtm take a corpora, tokenized or not, and create the
corresponding DocumentTermMatrix (DTM) stored as sparse
simple_triplet_matrix (see Details).
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)
## S3 method for class 'list'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)
## S3 method for class 'VCorpus'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)
## S3 method for class 'character'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1, docs_or_tokens = c("docs",
  "tokens"))
## Default S3 method:
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)
corpus | 
 (list) of documents, or a list of character vectors, each element reporting tokens from a document  | 
step | 
 (num) integer value (default is 500L) used to broken the
procedure in parts of at maximum   | 
parallel | 
 (lgl) if   | 
... | 
 further option passed to the function  | 
ncores | 
 (int) number of core to use in the parallel computation (default is number of machine cores minus one)  | 
docs_or_tokens | 
 (chr) if   | 
The algrithm of the simple triplet matrix considers three indeces i,
j, v, in which the indeces i, j represent
respectively the row (document) and the column (term/token) coordinate of an
entry v rapresent its weight (commonly the frequency).
Moreover, for compatibility reasons (with some machine learning R
implementation of algorithms which use different convention for the
representation of sparse matrices), the indeces are ordered with priority
i, j.
a multiclass DocumentTermMatrix and
simple_triplet_matrix object weigthed with simple
term frequencies, rappresenting a document-term matrix in which each
row represent a document, each columns a term (or token) and the
content the simple frequencies of the terms in the document.
data(liu_4h28)
corpus <- data2corpus(liu_4h28)
atom_dtm(corpus)
atom_dtm(c('one', 'two', 'one two'))             # three documents, two token
atom_dtm(c('one', 'two', 'one two'), docs_or_tokens = 'tokens')    # one docs
## Not run: 
  atom_dtm(corpus, parallel = TRUE)                    # parallel computation
  atom_dtm(c(1, 2, 3))                                 # error
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.