dtm_colsums: Column sums and Row sums for document term matrices

View source: R/nlp_flow.R

dtm_colsumsR Documentation

Column sums and Row sums for document term matrices

Description

Column sums and Row sums for document term matrices

Usage

dtm_colsums(dtm, groups)

dtm_rowsums(dtm, groups)

Arguments

dtm

an object returned by document_term_matrix

groups

optionally, a list with column/row names or column/row indexes of the dtm which should be combined by taking the sum over the rows or columns of these. See the examples

Value

Returns either a vector in case argument groups is not provided or a sparse matrix of class dgCMatrix in case argument groups is provided

  • in case groups is not provided: a vector of row/column sums with corresponding names

  • in case groups is provided: a sparse matrix containing summed information over the groups of rows/columns

Examples

x <- data.frame(
 doc_id = c(1, 1, 2, 3, 4), 
 term = c("A", "C", "Z", "X", "G"), 
 freq = c(1, 5, 7, 10, 0))
dtm <- document_term_matrix(x)
x <- dtm_colsums(dtm)
x
x <- dtm_rowsums(dtm)
head(x)

## 
## Grouped column summation
## 
x <- list(doc1 = c("aa", "bb", "aa", "b"), doc2 = c("bb", "bb", "BB"))
dtm <- document_term_matrix(x)
dtm
dtm_colsums(dtm, groups = list(combinedB = c("b", "bb"), combinedA = c("aa", "A")))
dtm_colsums(dtm, groups = list(combinedA = c("aa", "A")))
dtm_colsums(dtm, groups = list(
  combinedB = grep(pattern = "b", colnames(dtm), ignore.case = TRUE, value = TRUE), 
  combinedA = c("aa", "A", "ZZZ"),
  test      = character()))
dtm_colsums(dtm, groups = list())

## 
## Grouped row summation
## 
x <- list(doc1 = c("aa", "bb", "aa", "b"), 
          doc2 = c("bb", "bb", "BB"),
          doc3 = c("bb", "bb", "BB"),
          doc4 = c("bb", "bb", "BB", "b"))
dtm <- document_term_matrix(x)
dtm
dtm_rowsums(dtm, groups = list(doc1 = "doc1", combi = c("doc2", "doc3", "doc4")))
dtm_rowsums(dtm, groups = list(unknown = "docUnknown", combi = c("doc2", "doc3", "doc4")))
dtm_rowsums(dtm, groups = list())

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.