tif_is_dtm: Validate Document Term Matrix Object

View source: R/validators.R

tif_is_dtmR Documentation

Validate Document Term Matrix Object

Description

A valid document term matrix is a sparse matrix with the row representing documents and columns representing terms. The row names is a character vector giving the document ids with no duplicated entries. The column names is a character vector giving the terms of the matrix with no duplicated entries. The spare matrix should inherit from the Matrix class dgCMatrix.

Usage

tif_is_dtm(dtm, warn = FALSE)

Arguments

dtm

a document term matrix object to test the validity of

warn

logical. Should the function produce a verbose warning for the condition for which the validation fails. Useful for testing.

Details

The tests are run sequentially and the function returns, with a warning if the warn flag is set, on the first test that fails. We use this implementation because some tests may fail entirely or be meaningless if the prior ones are note passed. For example, if the dtm object is not a matrix it may not contain row or column names.

Value

a logical vector of length one indicating whether the input is a valid document term matrix

Examples

#' @importFrom Matrix Matrix
dtm <- Matrix::Matrix(0, ncol = 26, nrow = 5, sparse = TRUE)
colnames(dtm) <- LETTERS
rownames(dtm) <- sprintf("doc%d", 1:5)

tif_is_dtm(dtm)

ropensci/tif documentation built on Nov. 30, 2023, 7:46 p.m.