ccc_std_demographics: Recode CCES variables so that they merge to ACS variables
In kuriwaki/ccesMRPprep: Functions and Data to Prepare CCES data for MRP

ccc_std_demographics

R Documentation

Recode CCES variables so that they merge to ACS variables

Description

Recode CCES variables so that they merge to ACS variables

Usage

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key),
  wh_as_hisp = TRUE,
  bh_as_hisp = TRUE
)

Arguments

`tbl`	The cumulative common content. It can be any subset but must include variables `age`, `race`, `educ`, `gender`, `st`, `state`, and `cd`. Factor variables must a haven_labelled class variable as is the output of `get_cces_dataverse("cumulative")`. See ccc_samp for an example. Any other file (for example, year-specific common contents) are not compatible with this function.
`only_demog`	Drop variables besides demographics? Defaults to FALSE
`age_key`	The vector key to use to bin age. Can be `deframe(age5_key)` or `deframe(age10_key)`
`wh_as_hisp`	Should people who identify as both White and Hispanic be coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites by definition? Could be `NULL` if you know the column `hispanic` is not in the data. For more information, see https://bit.ly/3hZ6mz4.
`bh_as_hisp`	Same as `wh_as_hisp` but for Black Hispanics. Defaults to TRUE.

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.
educ is coarsened and relabelled with 4 categories to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key.
educ_3 is further coarsened to 3 categories, grouping together a BA and a higher degree into one category. This is necessary for some ACS tables that do not make the distinction. Make sure to decide which type of education variable to use beforehand after looking at the ACS codes
the same goes for race. These recodings are governed by the key-value pair race_key.
cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.
A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.
A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples


library(dplyr)

 ccc_std_demographics(ccc_samp)
 ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
 ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)

## Not run: 
 # For full data (takes a while)
 library(dataverse)
 cumulative_rds <- get_cces_dataverse("cumulative")
 cumulative_std <- ccc_std_demographics(cumulative_rds)
 
## End(Not run)

## Not run: 
 wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1"))
 wrong_cd_fmt %>% filter(st == "HI") %>% count(cd)

 # throws error because CD is formatted the wrong way
 ccc_std_demographics(wrong_cd_fmt)

## End(Not run)

kuriwaki/ccesMRPprep documentation built on June 10, 2025, 7:27 p.m.