atom_dtm: Create a dtm from a corpus (tf weights)
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

atom_dtm

R Documentation

Create a dtm from a corpus (tf weights)

Description

atom_dtm take a corpora, tokenized or not, and create the corresponding DocumentTermMatrix (DTM) stored as sparse simple_triplet_matrix (see Details).

Usage

atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)

## S3 method for class 'list'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)

## S3 method for class 'VCorpus'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)

## S3 method for class 'character'
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1, docs_or_tokens = c("docs",
  "tokens"))

## Default S3 method:
atom_dtm(corpus, step = 500L, parallel = FALSE, ...,
  ncores = parallel::detectCores() - 1)

Arguments

`corpus`	(list) of documents, or a list of character vectors, each element reporting tokens from a document
`step`	(num) integer value (default is 500L) used to broken the procedure in parts of at maximum `step` documents each one. This is to help to don't overflow the RAM.
`parallel`	(lgl) if `TRUE` (default is `FALSE`) run parallel computations using `makePSOCKcluster` backend with max - 1 core.
`...`	further option passed to the function
`ncores`	(int) number of core to use in the parallel computation (default is number of machine cores minus one)
`docs_or_tokens`	(chr) if `docs` (default) means that the sequencies of elements of the character vector represent a document each one, if `tokens` means that they represents the sequencies of tokens of one single documents

Details

The algrithm of the simple triplet matrix considers three indeces i, j, v, in which the indeces i, j represent respectively the row (document) and the column (term/token) coordinate of an entry v rapresent its weight (commonly the frequency).

Moreover, for compatibility reasons (with some machine learning R implementation of algorithms which use different convention for the representation of sparse matrices), the indeces are ordered with priority i, j.

Value

a multiclass DocumentTermMatrix and simple_triplet_matrix object weigthed with simple term frequencies, rappresenting a document-term matrix in which each row represent a document, each columns a term (or token) and the content the simple frequencies of the terms in the document.

Examples

data(liu_4h28)
corpus <- data2corpus(liu_4h28)
atom_dtm(corpus)
atom_dtm(c('one', 'two', 'one two'))             # three documents, two token
atom_dtm(c('one', 'two', 'one two'), docs_or_tokens = 'tokens')    # one docs

## Not run: 
  atom_dtm(corpus, parallel = TRUE)                    # parallel computation
  atom_dtm(c(1, 2, 3))                                 # error

## End(Not run)

UBESP-DCTV/costumer documentation built on Feb. 1, 2023, 4:52 a.m.

UBESP-DCTV/costumer index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UBESP-DCTV/costumer
COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

atom_dtm: Create a dtm from a corpus (tf weights)
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

Create a dtm from a corpus (tf weights)

Description

Usage

Arguments

Details

Value

Examples

Related to atom_dtm in UBESP-DCTV/costumer...

R Package Documentation

Browse R Packages

We want your feedback!

UBESP-DCTV/costumer COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

atom_dtm: Create a dtm from a corpus (tf weights) In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

Create a dtm from a corpus (tf weights)

Description

Usage

Arguments

Details

Value

Examples

Related to atom_dtm in UBESP-DCTV/costumer...

R Package Documentation

Browse R Packages

We want your feedback!

UBESP-DCTV/costumer
COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

atom_dtm: Create a dtm from a corpus (tf weights)
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews