dtm_chisq | R Documentation |
Perform a chisq.test
to compare if groups of documents have more prevalence of specific terms.
The function looks to each term in the document term matrix and applies a chisq.test
comparing the frequency
of occurrence of each term compared to the other terms in the document group.
dtm_chisq(dtm, groups, correct = TRUE, ...)
dtm |
a document term matrix: an object returned by |
groups |
a logical vector with 2 groups (TRUE / FALSE) where the size of the |
correct |
passed on to |
... |
further arguments passed on to |
a data.frame with columns term, chisq, p.value, freq, freq_true, freq_false indicating for each term in the dtm
,
how frequently it occurs in each group, the Chi-Square value and it's corresponding p-value.
data(brussels_reviews_anno) ## ## Which nouns occur in text containing the term 'centre' ## x <- subset(brussels_reviews_anno, xpos == "NN" & language == "fr") x <- x[, c("doc_id", "lemma")] x <- document_term_frequencies(x) dtm <- document_term_matrix(x) relevant <- dtm_chisq(dtm, groups = dtm[, "centre"] > 0) head(relevant, 10) ## ## Which adjectives occur in text containing the term 'hote' ## x <- subset(brussels_reviews_anno, xpos == "JJ" & language == "fr") x <- x[, c("doc_id", "lemma")] x <- document_term_frequencies(x) dtm <- document_term_matrix(x) group <- subset(brussels_reviews_anno, lemma %in% "hote") group <- rownames(dtm) %in% group$doc_id relevant <- dtm_chisq(dtm, groups = group) head(relevant, 10) ## Not run: # do not show scientific notation of the p-values options(scipen = 100) head(relevant, 10) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.