README.md

aug15 - Data set of Indian Independence Day Speeches

This package includes a data set of full-text English renderings of Indian Independence Day speeches, delivered annually on 15 August since 1947.

Recent speeches are easily found online from the Press Information Bureau. For older speeches, I was able to find them in volumes of collected speeches in the libraries of Jawaharlal Nehru University and the Nehru Memorial Museum. Speeches in those volumes were digitized by uploading images to Google Drive’s native OCR feature.

The data set is only missing speeches from 1962 and 1995. Please contact me if you’re able to find the speech for those years! Or evidence of one not taking place.

Installation

You can access the data set by installing the package from GitHub.

# install.packages("devtools")
devtools::install_github("seanangio/aug15")

The data set is called corpus. To preview it, run something like:

library(dplyr)
library(aug15)
glimpse(corpus)
#> Rows: 76
#> Columns: 8
#> $ year     <dbl> 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2…
#> $ pm       <chr> "Narendra Modi", "Narendra Modi", "Narendra Modi", "Narendra …
#> $ party    <chr> "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP"…
#> $ title    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ footnote <chr> "English rendering of the text of PM’s address from the Red F…
#> $ source   <chr> "Press Information Bureau", "Press Information Bureau", "Pres…
#> $ url      <chr> "https://pib.gov.in/", "https://pib.gov.in/", "https://pib.go…
#> $ text     <chr> "Best wishes to my dear countrymen on the momentous occasion …

Alternatively, you can directly download the csv file or browse any of the speeches in this folder.

Investigation

For a brief investigation into the data set (until 2021), this package includes a shiny app to make basic visualizations, including plots of:

Plot of speech word count

Plot of TF-IDF for recent years

Plot of most frequent positive and negative
words

Plot of net sentiment

Plot of frequency of the term
Kashmir



seanangio/aug15 documentation built on Aug. 27, 2023, 1:37 p.m.