The goal of textplex is to calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019). Further validation of the algorithm is available in Tolochko, Song & Boomgaarden (2019).
Caveats:
References:
You can install the experimental version of textplex from github:
## install.packages("devtools") devtools::install_github("chainsawriot/textplex")
Please also install and setup an spacyr environment, as per the instructions from the spacyr webpage. In essence, it means:
install.packages('spacyr') library(spacyr) spacy_insatll()
Calculating raw scores for English content.
library(textplex) library(spacyr) ## Load the english language model spacy_initialize(model = 'en') input_text <- c("A spectre is haunting Europe - the spectre of Communism. All the powers of old Europe have entered into a holy alliance to exorcise this spectre; Pope and Czar, Metternich and Guizot, French Radicals and German police-spies. Where is the party in opposition that has not been decried as communistic by its opponents in power? Where the Opposition that has not hurled back the branding reproach of Communism, against the more advanced opposition parties, as well as against its reactionary adversaries?", "The greatest improvement in the productive powers of labour, and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour. The effects of the division of labour, in the general business of society, will be more easily understood by considering in what manner it operates in some particular manufactures.") calculate_textplex(input_text)
Aber sprichst du Deutsch?
spacy_finalize() ## spacy_download_langmodel("de") spacy_initialize(model = 'de') de_input_text <- c("Entschuldigung. Ich kann mit Ihnen auf Deutsch nicht sprechen, weil mein Deutsch sehr schlecht ist. Man sagt 'deutsche Sprache, schwere Sprache'. Ich glaube, dass ich nur Bahnhof verstehe.", "In mir drin ist alles rot, das Gegenteil von tot. Mein Herz es schlägt sich noch ganz gut. In mir drin ist alles rot und du bist ein Idiot, mein Freund. Du verschmähst mein süßes Blut.") calculate_textplex(de_input_text)
Fit the two-factor model.
library(furrr) library(psych) library(sotu) library(dplyr) spacy_finalize() spacy_initialize(model = 'en') ### This is the preferred way to do parallelization. It doesn't use up all your RAM! ## sotu_rawscore <- furrr::future_map_dfr(sotu_text, calculate_textplex, .progress = TRUE) data(sotu_rawscore) fit <- fit_two_factor_model(sotu_rawscore) sotu_meta$syntactic_complexity <- fit$scores[,1] sotu_meta$semantic_complexity <- fit$scores[,2] sotu_meta$text <- sotu_text sotu_meta %>% arrange(desc(syntactic_complexity)) %>% select(president, year, syntactic_complexity, text)
How to write a syntactically complex SOTU, President Madison?
sotu_meta %>% arrange(desc(syntactic_complexity)) %>% select(president, year, syntactic_complexity, text) %>% slice(1) %>% pull(text) %>% print
We use this code as the benchmark on a Early 2015 Macbook Air
start_time <- Sys.time() sotu_rawscore <- furrr::future_map_dfr(sotu_text, calculate_textplex, .progress = TRUE) end_time <- Sys.time() print(end_time - start_time)
knitr::kable(data.frame(version = c("0.0.1"), "time (min)" = c(12.12)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.