View source: R/feature_preparation.r
term_union | R Documentation |
Given a dtm and a similarity (adjacency) matrix, group clusters of similar terms (simmat > 0) into a single column. Column names will be concatenated, with a "|" seperator (read as OR)
term_union(dtm, simmat, as_dfm = T, verbose = F, sep = "|", par = NA)
dtm |
A quanteda dfm or a CsparseMatrix. |
simmat |
A similarity matrix in CsparseMatrix format. For instance, created with term_char_sim |
as_dfm |
If True, return as quanteda dfm |
verbose |
If True, report progress |
sep |
The separator used for pasting the terms |
par |
If TRUE, add parentheses to colnames before combining. This is mainly for internal use, as it allows specification if OR (term_union) and AND (term_intersect) operations are combined. If NA, this is based on whether parenthese are present. |
A CsparseMatrix or quanteda dfm
dfm = quanteda::dfm(c('That guy Gadaffi','Do you mean Kadaffi?',
'Nah more like Gadaffel','Not Kadaffel?'))
simmat = term_char_sim(colnames(dfm), same_start=0)
term_union(dfm, simmat, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.