In bryanwhiting/generalconference: General Conference text corpus and web scrapers

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

generalconference

General Conference is a semi-annual event where members of The Church of Jesus Christ of Latter-day Saints gather to listen to church prophets, apostles, and other leaders.

This package both scrapes General Conference talks and provides all talks in a data package for analysis in R.

Install the package

# install.packages('devtools')
devtools::install_github("bryanwhiting/generalconference")

Load the package:

library(generalconference)

Load the General Conference corpus, which is a tibble with nested data for each conference, session, talk, and paragraph.

data("genconf")
head(genconf)

Getting Started

Unnest it to analyze individual talks, which can be unnested further to the paragraph level.

library(dplyr)
library(tidyr)
genconf %>%
  tidyr::unnest(sessions) %>%
  tidyr::unnest(talks) %>%
  head()

Analyze individual paragraphs that contain the word "faith":

library(gt)
genconf %>%
  # unpack/unnest the dataframe, which is a tibble of lists
  tidyr::unnest(sessions) %>%
  tidyr::unnest(talks) %>%
  tidyr::unnest(paragraphs) %>%
  # extract just the date, title, author and paragraph
  # date, title, and author will be repeated fields, with paragraph unique
  select(date, title1, author1, paragraph) %>%
  # Filter to just the paragraphs that mention the word "faith"
  filter(stringr::str_detect(paragraph, "faith")) %>%
  # take top 5 records
  head(5) %>%
  # convert into a gt() table with row groups for date/title/author
  # (use row groups since these data are replicated by paragraph)
  group_by(date, title1, author1) %>%
  gt() %>%
  tab_options(
    row_group.background.color = 'lightgray'
  ) %>%
  tab_header(
    title='Paragraphs on Faith',
    subtitle='Grouped by talk'
  )