textcorpus is collection of text courpus datasets. The package also contains tools to enable easy community contributions to the package. The underying premise is that the speech level data is stored with meta data as a list of two tibble data frames with a common key column.
To download the development version of textcorpus:
Download the zip
ball or tar
ball, decompress
and run R CMD INSTALL
on it, or use the pacman package to install
the development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/textcorpus")
presidential_debates
debate
political
Tyler Rinker
2017-03-08
dylan_songs
song
folk, rock
Tyler Rinker
2017-03-08
nixon_tapes
transcript
political
Tyler Rinker
2017-03-08
dplyr akes joining the corpus and meta data easy.
pacman::p_load(tidyverse, sentimentr, formality, readability)
pacman::p_load_current_gh('trinker/textcorpus')
nixon_tapes
## $corpus
## # A tibble: 8,817 × 4
## id author
## <chr> <chr>
## 1 00 - connally Nixon
## 2 00 - connally Campbell
## 3 00 - connally Nixon
## 4 00 - connally Campbell
## 5 00 - connally Nixon
## 6 00 - connally Unidentified
## 7 00 - connally Unidentified
## 8 00 - connally Unidentified
## 9 00 - connally Unidentified
## 10 00 - connally Unidentified
## # ... with 8,807 more rows, and 2 more variables: text <chr>, order <int>
##
## $meta
## # A tibble: 31 × 4
## id date location minutes
## <chr> <date> <chr> <dbl>
## 1 00 - connally 1971-03-23 White House Oval Office 30
## 2 01 1972-06-23 White House Oval Office 8
## 3 02 1972-06-23 White House Oval Office 4
## 4 03 1972-06-23 Old Executive Office Building Office 6
## 5 04 1972-09-15 White House Oval Office 34
## 6 05 1973-01-08 Old Executive Office Building Office 7
## 7 10 1973-03-17 White House Oval Office 21
## 8 11 1973-03-20 White House Oval Office 11
## 9 12 1973-03-21 White House Oval Office 83
## 10 13 1973-03-21 Old Executive Office Building Office 36
## # ... with 21 more rows
dat <- nixon_tapes$corpus %>%
dplyr::left_join(nixon_tapes$meta, by = 'id')
dat
## # A tibble: 8,817 × 7
## id author
## <chr> <chr>
## 1 00 - connally Nixon
## 2 00 - connally Campbell
## 3 00 - connally Nixon
## 4 00 - connally Campbell
## 5 00 - connally Nixon
## 6 00 - connally Unidentified
## 7 00 - connally Unidentified
## 8 00 - connally Unidentified
## 9 00 - connally Unidentified
## 10 00 - connally Unidentified
## # ... with 8,807 more rows, and 5 more variables: text <chr>, order <int>,
## # date <date>, location <chr>, minutes <dbl>
Here we calculate formality, sentiment, and readability measures. An
additional call to dplyr's left_jon
with a Reduce
makes it easy
to merge the various score frames into one frame.
n_formality <- dat %>%
filter(author == "Nixon") %>%
with(formality(text, list(author, id, date)))
n_sentiment <- dat %>%
filter(author == "Nixon") %>%
with(sentiment_by(text, list(author, id, date)))
n_readability <- dat %>%
filter(author == "Nixon") %>%
with(readability(text, list(author, id, date)))
stats_dat <- list(n_formality, n_sentiment, n_readability) %>%
Reduce(function(x, y) left_join(x, y, by=c("author", "id", "date")), .)
stats_dat %>%
select(date, F, ave_sentiment, Average_Grade_Level) %>%
rename(Formality = F, Sentiment = ave_sentiment, Readbiltiy = Average_Grade_Level) %>%
gather(Measure, Score, -date) %>%
mutate(Date = as.factor(date), Date2 = as.numeric(Date)) %>%
ggplot(aes(x = Date2, y = Score)) +
geom_point() +
geom_smooth(span = 0.4, fill = NA) +
facet_wrap(~Measure, ncol = 1, scales = 'free_y')
## `geom_smooth()` using method = 'loess'
You are welcome to: - submit suggestions and bug-reports at: https://github.com/trinker/textcorpus/issues - send a pull request on: https://github.com/trinker/textcorpus/ - compose a friendly e-mail to: tyler.rinker@gmail.com
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.