ccc_std_demographics: Recode CCES variables so that they merge to ACS variables

View source: R/cces_std-for-acs.R

ccc_std_demographicsR Documentation

Recode CCES variables so that they merge to ACS variables

Description

Recode CCES variables so that they merge to ACS variables

Usage

ccc_std_demographics(
  tbl,
  only_demog = FALSE,
  age_key = deframe(ccesMRPprep::age5_key),
  wh_as_hisp = TRUE,
  bh_as_hisp = TRUE
)

Arguments

tbl

The cumulative common content. It can be any subset but must include variables age, race, educ, gender, st, state, and cd. Factor variables must a haven_labelled class variable as is the output of get_cces_dataverse("cumulative"). See ccc_samp for an example. Any other file (for example, year-specific common contents) are not compatible with this function.

only_demog

Drop variables besides demographics? Defaults to FALSE

age_key

The vector key to use to bin age. Can be deframe(age5_key) or deframe(age10_key)

wh_as_hisp

Should people who identify as both White and Hispanic be coded as "Hispanic", thereby leaving all remaining "Whites" as Non-Hispanic Whites by definition? Could be NULL if you know the column hispanic is not in the data. For more information, see https://bit.ly/3hZ6mz4.

bh_as_hisp

Same as wh_as_hisp but for Black Hispanics. Defaults to TRUE.

Value

The output is of the same dimensions as the input (unless only_demog = TRUE) but with the following exceptions:

  • age is coded to match up with the ACS bins and the recoding occurs in a separate function, ccc_bin_age. The unbinned age is left instead to age_orig.

  • educ is coarsened and relabelled with 4 categories to match up with the ACS. (the original version is left as educ_cces_chr). Recoding is governed by the key-value pairs educ_key.

  • educ_3 is further coarsened to 3 categories, grouping together a BA and a higher degree into one category. This is necessary for some ACS tables that do not make the distinction. Make sure to decide which type of education variable to use beforehand after looking at the ACS codes

  • the same goes for race. These recodings are governed by the key-value pair race_key.

  • cd is standardized so that at large districts are given "01" and single-digit districts are padded with 0s. e.g. "WY-01" and "CA-02".

Input Requirements

This function requires data to have the following columns:

  • A string column called st that is a two-letter abbreviation of the state, or a labelled variable coercible to a string.

  • A string column called cd that has the congressional district that is of the form "WY-01", OR a numeric column called dist that has the numeric district number. cd_up can also be used for the district in the upcoming election.

  • A <numeric+labelled> column called educ for education, race for race, age for age, and gender for gender, with values following the cumulative content.

Examples


library(dplyr)

 ccc_std_demographics(ccc_samp)
 ccc_std_demographics(ccc_samp, wh_as_hisp = FALSE) %>% count(race)
 ccc_std_demographics(ccc_samp, bh_as_hisp = FALSE, wh_as_hisp = FALSE) %>% count(race)

## Not run: 
 # For full data (takes a while)
 library(dataverse)
 cumulative_rds <- get_cces_dataverse("cumulative")
 cumulative_std <- ccc_std_demographics(cumulative_rds)
 
## End(Not run)

## Not run: 
 wrong_cd_fmt <- mutate(ccc_samp, cd = str_replace_all(cd, "01", "1"))
 wrong_cd_fmt %>% filter(st == "HI") %>% count(cd)

 # throws error because CD is formatted the wrong way
 ccc_std_demographics(wrong_cd_fmt)

## End(Not run)



kuriwaki/ccesMRPprep documentation built on Oct. 26, 2024, 10:22 p.m.