textplex

The goal of textplex is to calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019). Further validation of the algorithm is available in Tolochko, Song & Boomgaarden (2019).

Caveats:

  1. The current version uses spacyr (Benoit et al.) to maintain compatibility with the original implementation (unlike the initial implement, which used udpipe). Future versions will add back the option to use udpipe.
  2. Super slow (although has been improved), but premature optimization is the root of all evil. See benchmark.
  3. For the algorithm by Benoit, Munger & Spirling (2019), use sophistication.

References:

  1. Tolochko, P., & Boomgaarden, H. G. (2019). Determining Political Text Complexity: Conceptualizations, Measurements, and Application. International Journal of Communication, 13, 21.
  2. Tolochko, P., Song, H., & Boomgaarden, H. (2019). “That Looks Hard!”: Effects of Objective and Perceived Textual Complexity on Factual and Structural Political Knowledge. Political Communication, 1-20.
  3. Benoit, K., Munger, K., & Spirling, A. (2019). Measuring and explaining political sophistication through textual complexity. American Journal of Political Science, 63(2), 491-508.

Installation

You can install the experimental version of textplex from github:

## install.packages("devtools")
devtools::install_github("chainsawriot/textplex")

Please also install and setup an spacyr environment, as per the instructions from the spacyr webpage. In essence, it means:

install.packages('spacyr')
library(spacyr)
spacy_insatll()

Examples

Calculating raw scores for English content.

library(textplex)
library(spacyr)
## Load the english language model
spacy_initialize(model = 'en')

input_text <- c("A spectre is haunting Europe - the spectre of Communism.
All the powers of old Europe have entered into a holy alliance to exorcise
 this spectre; Pope and Czar, Metternich and Guizot, French Radicals
and German police-spies. Where is the party in opposition that has not been
 decried as communistic by its opponents in power? Where the Opposition that
 has not hurled back the branding reproach of Communism, against the more
 advanced opposition parties, as well as against its reactionary adversaries?",
"The greatest improvement in the productive powers of labour, and the greater
 part of the skill, dexterity, and judgment with which it is anywhere
 directed, or applied, seem to have been the effects of the division of
 labour. The effects of the division of labour, in the general business of
 society, will be more easily understood by considering in what manner it
 operates in some particular manufactures.")

calculate_textplex(input_text)

Aber sprichst du Deutsch?

spacy_finalize()

## spacy_download_langmodel("de")

spacy_initialize(model = 'de')

de_input_text <- c("Entschuldigung. Ich kann mit Ihnen auf Deutsch nicht
 sprechen, weil mein Deutsch sehr schlecht ist. Man sagt 'deutsche Sprache,
 schwere Sprache'. Ich glaube, dass ich nur Bahnhof verstehe.",
 "In mir drin ist alles rot, das Gegenteil von tot. Mein Herz es schlägt sich
 noch ganz gut. In mir drin ist alles rot und du bist ein Idiot, mein Freund.
 Du verschmähst mein süßes Blut.")

calculate_textplex(de_input_text)

Fit the two-factor model.

library(furrr)
library(psych)
library(sotu)
library(dplyr)

spacy_finalize()

spacy_initialize(model = 'en')

### This is the preferred way to do parallelization. It doesn't use up all your RAM!

## sotu_rawscore <- furrr::future_map_dfr(sotu_text, calculate_textplex, .progress = TRUE)

data(sotu_rawscore)
fit <- fit_two_factor_model(sotu_rawscore)

sotu_meta$syntactic_complexity <- fit$scores[,1]
sotu_meta$semantic_complexity <- fit$scores[,2]
sotu_meta$text <- sotu_text

sotu_meta %>% arrange(desc(syntactic_complexity)) %>% select(president, year, syntactic_complexity, text)

How to write a syntactically complex SOTU, President Madison?

sotu_meta %>% arrange(desc(syntactic_complexity)) %>%
    select(president, year, syntactic_complexity, text) %>%
    slice(1) %>% pull(text) %>% print

Benchmark

We use this code as the benchmark on a Early 2015 Macbook Air

start_time <- Sys.time()
sotu_rawscore <- furrr::future_map_dfr(sotu_text, calculate_textplex, .progress = TRUE)
end_time <- Sys.time()
print(end_time - start_time)
knitr::kable(data.frame(version = c("0.0.1"), "time (min)" = c(12.12)))


chainsawriot/textplex documentation built on March 19, 2020, 3:39 a.m.