term_intersect: Combine terms in a dtm

View source: R/feature_preparation.r

term_intersectR Documentation

Combine terms in a dtm

Description

Given a dtm and a similarity (adjacency) matrix, create a new column for each nonzero cell in the similarity matrix. For the term combinations (everything except the diagonal) the column names will be pasted together with a "&" separator (read as AND)

Usage

term_intersect(dtm, simmat, as_dfm = T, verbose = F, sep = " & ", par = NA)

Arguments

dtm

A quanteda dfm or a CsparseMatrix.

simmat

A similarity matrix in CsparseMatrix format. For instance, created with term_char_sim

as_dfm

If True, return as quanteda dfm

verbose

If True, report progress

sep

The separator used for pasting the terms

par

If TRUE, add parentheses to colnames before combining. This is mainly for internal use, as it allows specification if OR (term_union) and AND (term_intersect) operations are combined. If NA, this is based on whether parenthese are present.

Value

A CsparseMatrix or quanteda dfm


RNewsflow documentation built on May 31, 2023, 6:53 p.m.