dtm_remove_terms: Remove terms from a Document-Term-Matrix and keep only...

View source: R/nlp_flow.R

dtm_remove_termsR Documentation

Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms

Description

Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms

Usage

dtm_remove_terms(dtm, terms, remove_emptydocs = TRUE)

Arguments

dtm

an object returned by document_term_matrix

terms

a character vector of terms which are in colnames(dtm) and which should be removed

remove_emptydocs

logical indicating to remove documents containing no more terms after the term removal is executed. Defaults to TRUE.

Value

a sparse Matrix as returned by sparseMatrix where the indicated terms are removed as well as documents with no terms whatsoever

Examples

data(brussels_reviews_anno)
x <- subset(brussels_reviews_anno, xpos == "NN")
x <- x[, c("doc_id", "lemma")]
x <- document_term_frequencies(x)
dtm <- document_term_matrix(x)
dim(dtm)
x <- dtm_remove_terms(dtm, terms = c("appartement", "casa", "centrum", "ciudad"))
dim(x)
x <- dtm_remove_terms(dtm, terms = c("appartement", "casa", "centrum", "ciudad"), 
                      remove_emptydocs = FALSE)
dim(x)

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.