library(fivethirtyeight) library(ggplot2) library(dplyr) library(readr) library(knitr) library(tibble) # Pull all dataset names all_datasets <- datasets_master %>% pull(`Data Frame Name`) %>% unique() # Pull all fivethirtyeightdata dataset names all_fivethirtyeightdata_datasets <- datasets_master %>% filter(`In fivethirtyeightdata?` == "Y") %>% pull(`Data Frame Name`) %>% unique() %>% sort() if(FALSE){ # Get data set names as listed in pkg pkg_data_list <- data(package = "fivethirtyeightdata")[["results"]] %>% as_tibble() %>% pull(Item) %>% sort() # This should yield TRUE identical(all_fivethirtyeightdata_datasets, pkg_data_list) } # Pull all fivethirtyeight dataset names all_fivethirtyeight_datasets <- datasets_master %>% filter(is.na(`In fivethirtyeightdata?`)) %>% pull(`Data Frame Name`) %>% unique() %>% sort() if(FALSE){ # Get data set names as listed in pkg pkg_data_list <- data(package = "fivethirtyeight")[["results"]] %>% as_tibble() %>% filter(Item != "datasets_master") %>% pull(Item) %>% sort() # This should yield TRUE identical(all_fivethirtyeight_datasets, pkg_data_list) }
We are aware of this tweet{target="_blank"} by Mona Chalabi. Although, we have not yet decided the future of the fivethirtyeight
package (and subsequently, the fivethirtyeightdata
package), we re-iterate that this package is not officially published by 538.
There are r all_fivethirtyeight_datasets %>% length()
datasets included in the fivethirtyeight
package. However, there are also r all_fivethirtyeightdata_datasets %>% length()
datasets that could not be included in fivethirtyeight
due to CRAN package size restrictions:
all_fivethirtyeightdata_datasets
These r all_fivethirtyeightdata_datasets %>% length()
datasets are included in the fivethirtyeightdata
add-on package^[The fivethirtyeightdata
package is hosted via a drat
repository{target="_blank"}], which you can install by running:
install.packages('fivethirtyeightdata', repos = 'https://fivethirtyeightdata.github.io/drat/', type = 'source')
So for example, to load the senators
dataset, run:
library(fivethirtyeight) library(fivethirtyeightdata) senators
All r all_fivethirtyeight_datasets %>% length()
+ r all_fivethirtyeightdata_datasets %>% length()
= r all_datasets %>% length()
datasets between the fivethirtyeight
and fivethirtyeightdata
packages are listed here.
datasets_master %>% mutate(`Data Frame Name` = paste("`", `Data Frame Name`, "`", sep=""), `In fivethirtyeightdata?` = ifelse(is.na(`In fivethirtyeightdata?`), "", "Yes")) %>% kable()
The motivation for creating this package is articulated in The fivethirtyeight R Package: "Tame Data" Principles for Introductory Statistics and Data Science Courses by Kim, Ismay, and Chunn (2018) published in Volume 11, Issue 1 of the journal "Technology Innovations in Statistics Education". Here is an executive summary.
We are involved in statistics and data science education, in particular at the introductory undergraduate level. As such, we are always looking for data sets that balance being:
It has been our experience that many data sets that exist in R packages, such as the
nycflights13
,
babynames
, and gapminder
packages, are of great pedagogical value as they:
It is along these lines that we present fivethirtyeight
: an R package of data and code behind the stories and interactives at FiveThirtyEight.com, a data-driven journalism website founded by Nate Silver and owned by ESPN. FiveThirtyEight has been very forward thinking in making the data used in many of their articles open and accessible on GitHub, a web-based repository for collaboration on code and data.
With consultation from Andrew Flowers and Andrei Scheinkman of FiveThirtyEight, we go one step further by:
In order to make the data easily accessible to R novices, we pre-process the original data sets as they exist in the 538 GitHub repository to adhere to the following "tame" data guidelines:
snake_case
and is an alternative to camelCase
, where successive words are delineated with upper case characters.year
variable exists, then it should be represented as a numerical variable.year
and month
variables, then convert them to Date
objects as year-month-01
. In other words, associate all observations from the same month to have a day
of 01
so that a correct Date
object can be assigned.year
, month
, and day
variables, then convert them to Date
objects as year-month-day
.ordered
factors.factor
s.character
s.TRUE/FALSE
logical variables. Note: The code used to pre-process the data can be found on the GitHub repository for the package in the process_data_sets.R
files. These can serve as data manipulation/wrangling examples and exercises for more advanced students.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.