dtm_compare: Compare two document term matrices
In corpustools: Managing, Querying and Analyzing Tokenized Text

dtm_compare

R Documentation

Compare two document term matrices

Description

Compare two document term matrices

Usage

dtm_compare(
  dtm.x,
  dtm.y = NULL,
  smooth = 0.1,
  min_ratio = NULL,
  min_chi2 = NULL,
  select_rows = NULL,
  yates_cor = c("auto", "yes", "no"),
  x_is_subset = F,
  what = c("freq", "docfreq", "cooccurrence")
)

Arguments

`dtm.x`	the main document-term matrix
`dtm.y`	the 'reference' document-term matrix
`smooth`	Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value.
`min_ratio`	threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y
`min_chi2`	threshold for the chi^2 value
`select_rows`	Alternative to using dtm.y. Has to be a vector with rownames, by which
`yates_cor`	mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used.
`x_is_subset`	Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y
`what`	choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N)