knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

spRingsteen

R-CMD-check CRAN status

The spRingsteen package provides a number of dataframes describing the songs, albums, tours, and setlists of Bruce Springsteen's career. The data (collected from Brucebase) is provided in a tidy form which is easily analyzed in R. The scripts which are used to scrape the data in their entirety, alongside a SQLite representation of the data may be viewed at a second repository springsteen_db.

Installation

You can install the released version of spRingsteen from CRAN with:

install.packages("spRingsteen")

Alternatively, you can install the development version of spRingsteen from GitHub like so:

remotes::install_github("obrienjoey/spRingsteen")

Data refresh

While the spRingsteen CRAN version is updated every few months, the Github (Dev) version is updated on a daily basis. The update_data function enables to overcome this gap and keep the installed version with the most recent data available on the Github version:

library(spRingsteen)
update_data()

Note: must restart the R session to have the updates available

Usage

Concerts

The package includes datasets around the career of Bruce Springsteen. For example, the touring history of him and his numerous bands is stored in concerts:

library(spRingsteen)
library(dplyr)

concerts

# how many concerts have occurred in each country?

concerts %>% 
  count(country, sort = TRUE)

Setlists

It also has information of the setlists performed in these shows which are stored in setlists.

setlists

# what song has been played most by Springsteen?

setlists %>%
  count(song, sort = TRUE)

# which song has most frequently opened a show?

setlists %>%
  filter(song_number == 1) %>%
  count(song, sort = TRUE) %>%
  slice(1)

Songs

Further details of the songs themselves are available in songs, including the album of appearance and also the full lyrics in some cases. This allows for some text mining or sentiment analysis using a package like tidytext.

library(tidytext)

# what word appears most frequently in the **Born in the U.S.A** album?

songs %>% 
  filter(album == "Born In The U.S.A.") %>% 
  select(title, lyrics) %>% 
  unnest_tokens(word, lyrics) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words, by = 'word')

Tours

Lastly, the tour table contains the tours associated with each concert.

tours %>% 
  count(tour, sort = TRUE)

Of course the real advantage of this package is in combining the different dataframes in order to infer useful information:

# what was the most played song on each tour?
setlists %>% 
  left_join(tours, by = 'gig_key') %>%
  count(song, tour) %>%
  group_by(tour) %>%
  filter(n == max(n)) %>%
  arrange(desc(tour))


obrienjoey/spRingsteen documentation built on Nov. 11, 2023, 11:41 p.m.