Build Status Project Status: Active codecov

texttimetravel

Tools for analysing temporally structured text collections, including tools for reading large sets of texts in (via pdftools), and for time series analysis of qualitative statistics such as word associations and topic models (primarily via quanteda and topicmodels.

Installation

System requirements (for linux):

For other systems, see respective documentation for pdftools) and topicmodels.

devtools::install_github ('mpadge/texttimetravel')

Usage

Load packages and a temporally-structured corpus to work with:

devtools::load_all (".", export_all = FALSE)
library (quanteda)
dat <- data_corpus_inaugural
library (texttimetravel)
library (quanteda)
dat <- data_corpus_inaugural
#dat <- corpus_reshape (dat, to = "sentences") # if desired

(data_corpus_inaugural is a sample corpus from quanteda of inaugural speeches of US presidents.) Then use quanteda functions to convert to desired tokenized form:

tok <- tokens (dat,
               remove_numbers = TRUE,
               remove_punct = TRUE,
               remove_separators = TRUE)
tok <- tokens_remove (tok, stopwords("english"))

keywords

Keyword associations can be extracted with the ttt_keyness function, which relies on the quanteda::keyness function, yet simplifies the interface by allowing keyness statistics to be extracted with a single function call.

x <- ttt_keyness (tok, "politic*")
head (x, n = 10) %>% knitr::kable()
x <- ttt_keyness (tok, "school*")
head (x, n = 10) %>% knitr::kable()

topics

The function ttt_fit_topics provides a convenient wrapper around the functions provided by the topicmodels package, and extends functionality via two additional parameters:

  1. years, allowing topic models to be fitted only to those portions of a corpus corresponding to the specified years;
  2. topic, allowing models to be fitted around a specified topic phrase.
x <- ttt_fit_topics (tok, ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()
x <- ttt_fit_topics (tok, years = 1789:1900, ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()
x <- ttt_fit_topics (tok, topic = "nation", ntopics = 5)
topicmodels::get_terms(x, 10) %>% knitr::kable()


mpadge/texttimetravel documentation built on Nov. 14, 2020, 11:31 a.m.