proportion_shift: Proportion Shift
In pverspeelt/shifterator: Functionality for Constructing Word Shift Graphs

proportion_shift

R Documentation

Proportion Shift

Description

Shift object for calculating differences in proportions of types across two systems.

Usage

proportion_shift(type2freq_1, type2freq_2)

Arguments

`type2freq_1`	A data.frame containing words and their frequencies.
`type2freq_2`	A data.frame containing words and their frequencies.

Details

The easiest word shift graph that we can construct is a proportion shift. If p_i^{(1)} is the relative frequency of word i in the first text, and p_i^{(2)} is its relative frequency in the second text, then the proportion shift calculates their difference:

δ p_i = p_i^{(2)} - p_i^{(1)}

If the difference is positive (δ p_i > 0), then the word is relatively more common in the second text. If it is negative (δ p_i < 0), then it is relatively more common in the first text. We can rank words by this difference and plot them as a word shift graph.

Value

Returns a list object of class shift.

Examples

#' library(shifterator)
library(quanteda)
library(quanteda.textstats)
library(dplyr)

reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>% 
  tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame
select(feature, frequency) 

bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>% 
tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% 
select(feature, frequency)

prop <- proportion_shift(reagan, bush)

pverspeelt/shifterator documentation built on Oct. 7, 2022, 3:37 a.m.