View source: R/compare_corpora.r
dtm_compare | R Documentation |
Compare two document term matrices
dtm_compare(
dtm.x,
dtm.y = NULL,
smooth = 0.1,
min_ratio = NULL,
min_chi2 = NULL,
select_rows = NULL,
yates_cor = c("auto", "yes", "no"),
x_is_subset = F,
what = c("freq", "docfreq", "cooccurrence")
)
dtm.x |
the main document-term matrix |
dtm.y |
the 'reference' document-term matrix |
smooth |
Laplace smoothing is used for the calculation of the probabilities. Here you can set the added (pseuocount) value. |
min_ratio |
threshold for the ratio value, which is the ratio of the relative frequency of a term in dtm.x and dtm.y |
min_chi2 |
threshold for the chi^2 value |
select_rows |
Alternative to using dtm.y. Has to be a vector with rownames, by which |
yates_cor |
mode for using yates correctsion in the chi^2 calculation. Can be turned on ("yes") or off ("no"), or set to "auto", in which case cochrans rule is used to determine whether yates' correction is used. |
x_is_subset |
Specify whether dtm.x is a subset of dtm.y. In this case, the term frequencies of dtm.x will be subtracted from the term frequencies in dtm.y |
what |
choose whether to compare the frequency ("freq") of terms, or the document frequency ("docfreq"). This also affects how chi^2 is calculated, comparing either freq relative to vocabulary size or docfreq relative to corpus size (N) |
A data frame with rows corresponding to the terms in dtm and the statistics in the columns
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.