Clean House of Representatives chronology

Input data-raw/chronology-raw.csv

Output data-raw/chronology-clean.csv

library(tidyverse)
library(here)
rep_raw <- read_csv(here("data-raw", "chronology-raw.csv"))

Extract useful session variables

Session year:

rep_clean <- rep_raw %>%
  mutate(
    session_name = session %>% 
      str_extract("[0-9]{4} (Special )?(Regular )?(Session)?( [0-9])?") %>% 
      str_trim(),
    session_year = session_name %>%
      str_extract("[0-9]{4}") %>% 
      parse_number(),
    regular = str_detect(session_name, "Regular") | 
              !(str_detect(session_name, "Special Session"))) %>% 
  select(-session)

Clean up party variable

Set empty party to missing:

rep_clean <- rep_clean %>%
  mutate(
    party = ifelse(party %in% c("", "Unknown"), 
      NA_character_, party)
    )

Use full names for parties:

unique(rep_clean$party)

Some slashes indicate a person has changed affiliation, others are for an actual party:

party_names <- c(
  "D"     = "Democrat",
  "I"     = "Independent",
  "D/P"   = "Progressive Democrat",
  "R/P"   = "Progressive Republican",
  "P"     = "People's",
  "F"     = "Federalist",
  "C"     = "Citizen",
  "U"     = "Union",
  "R"     = "Republican",
  "W"     = "Whig",
  # changed affiliation
  "R/I/D" = "Republican/Independent/Democrat",
  "D/I"   = "Democrat/Independent",
  "R/D"   = "Republican/Democrat",
  "D/R"   = "Democrat/Republican"
)
party_names[unique(rep_clean$party)]
rep_clean <- rep_clean %>% 
  mutate(party = party_names[party])

Output

Save:

write_csv(rep_clean, here("data-raw", "chronology-clean.csv"))


or-house-vis/history documentation built on May 15, 2019, 1:11 p.m.