proportion_shift | R Documentation |
Shift object for calculating differences in proportions of types across two systems.
proportion_shift(type2freq_1, type2freq_2)
type2freq_1 |
A data.frame containing words and their frequencies. |
type2freq_2 |
A data.frame containing words and their frequencies. |
The easiest word shift graph that we can construct is a proportion shift. If p_i^{(1)} is the relative frequency of word i in the first text, and p_i^{(2)} is its relative frequency in the second text, then the proportion shift calculates their difference:
δ p_i = p_i^{(2)} - p_i^{(1)}
If the difference is positive (δ p_i > 0), then the word is relatively more common in the second text. If it is negative (δ p_i < 0), then it is relatively more common in the first text. We can rank words by this difference and plot them as a word shift graph.
Returns a list object of class shift.
Other shifts:
entropy_shift()
,
jsdivergence_shift()
,
kldivergence_shift()
,
weighted_avg_shift()
#' library(shifterator) library(quanteda) library(quanteda.textstats) library(dplyr) reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>% tokens(remove_punct = TRUE) %>% dfm() %>% textstat_frequency() %>% as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame select(feature, frequency) bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>% tokens(remove_punct = TRUE) %>% dfm() %>% textstat_frequency() %>% as.data.frame() %>% select(feature, frequency) prop <- proportion_shift(reagan, bush)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.