In baumer-lab/fec20: Data Package for the 2020 United States Federal Elections

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6, fig.height = 4.5
)
library(fec20)
library(dplyr)
library(ggplot2)
library(scales)
library(stringr)

The fec20 package houses the relational datasets from the U.S. Federal 2020 elections. Some datasets are included in full, while a sample of the others is available with the option of retrieving the entire datasets through the functions built in the package.

This package is the successor of the fec16 package (of the 2016 elections). For more details, visit the fec16 vignette.

Who should use this package?

Anyone interested in US politics and elections who wants to use actual data to think critically and make inferences. We made this package particularly with students and instructors in mind as there is demand for relational data in teaching. Like fe16, fec20 is another one-stop shop for acquiring data of this genre.

Datasets Included

Full Datasets

candidates: candidates registered with the FEC during the 2019-2020 election cycle
committees: committees registered with the FEC during the 2019-2020 election cycle
campaigns: the house/senate current campaigns
pac: Political Action Committee (PAC) and party summary financial information
states: geographical information about the 50 states

Sample Datasets (with 1000 random rows each)

individuals: individual contributions to candidates/committees during the 2020 general presidential election
contributions: candidates and their contributions from committees during the 2020 general election
expenditures: the operating expenditures
transactions: transactions between committees

Forthcoming Datasets

There are 3 forthcoming datasets that contain results from the House, Senate, and Presidential elections.

Functions Included

The following functions retrieve the entire datasets for the sampled ones listed above. The size of the raw file that is downloaded by calling each function is given for reference. All functions have an argument n_max which defaults to the entire dataset but the user can specify the max length of the dataset to be loaded via this argument.

read_all_individuals() ~ 9.33GB
read_all_contributions() ~ 28.9MB
read_all_expenditures() ~ 65.6MB
read_all_transactions() ~ 235MB

for example:

# The entire expenditures dataset can be accessed by:
all_expenditures <- fec20::read_all_expenditures()

# The first 30 entries in this dataset can be accessed by:
expenditures_30 <- fec20::read_all_expenditures(n_max = 30)

More details can be found on the documentation pages which can be called via: ?function_name

What does the data look like?

The first six rows of the candidates dataset look like:

head(candidates)

Examples

Data Wrangling

fec16 can be used to summarize data in order see how many candidates are running for elections (in all offices) for the two major parties:

library(dplyr)

data <- candidates %>%
  filter(cand_pty_affiliation %in% c("REP", "DEM")) %>%
  group_by(cand_pty_affiliation) %>%
  summarize(size = n())

data

Data Visualization

We can visualize the above data:

library(ggplot2)

ggplot(data, aes(x = cand_pty_affiliation, y = size, fill = cand_pty_affiliation)) +
  geom_col() +
  labs(
    title = "Number of Candidates Affiliated with the Two Major Parties",
    x = "Party", y = "Count", fill = "Candidate Party Affiliation"
  )