knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package provides functions for imputing an individual's race/ethnicity given their first name, last name, geolocation, political party, age, gender, and address characteristics. The method is based on a Naive Bayes classification algorithm which incorporates known ethnorace distributions over the observed characteristics to generate posterior probabilities for each individual.
You can install the development version with:
# install.packages("devtools") devtools::install_github("bwilden/bper")
The data necessary for the imputation algorithm come from the US Census API. You will need a Census API key in order to use the package (https://api.census.gov/data/key_signup.html).
library(bper)
example_persons
The bper
package works by matching Census data to column names in your input data frame. To ensure all possible information is used in the imputation algorithm, reformat your variables in the following way:
first_name
. A character variable. Remove all punctuation and white spaces.last_name
. A character variable. Remove all punctuation and white spaces.age
. A numeric variable.sex
. A numeric variable with 1
corresponding to female and 0
corresponding to male.party
. A character variable. Available categories are DEM
for Democratic, REP
for Republican, and UNA
for independent/other.multi_unit
. A numeric variable with 1
corresponding to residence in multi-unit housing (typically an address containing "Apt", "Unit", "#") and 0
corresponding to residence in single-unit housing.state
. A character variable with the 2 letter state abbreviation.county
. A character variable with the 3 digit FIPS county code.tract
. A character variable with the 6 digit Census tract code.block
. A character variable with the 4 digit Census block code.place
. A character variable with the 6 digit Census place code.zip
. A character variable with the 5 digit ZIP code.Not all of the above variables are required (only the most detailed geographic variable is used), but more information leads to better predictions.
The function load_bper_data
allows you to load the Census data necessary for the imputation algorithm ahead of time. This can be a big time-saver particularly when using detailed geographic inputs (tracts or blocks).
my_bper_data <- load_bper_data( input_data = example_persons, year = 2010, census_key = "your_census_key" ) save(my_bper_data, file = "my_bper_data.rda")
Use impute_ethnorace
to implement the race/ethnicity imputation algorithm.
# Method 1, loading Census data and imputing ethnorace imputed_data <- impute_ethnorace( input_data = example_persons, year = 2010, census_key = "your_census_key" ) # Method 2, using pre-loaded Census data imputed_data <- impute_ethnorace( input_data = example_persons, bper_data = my_bper_data, year = 2010, census_key = "your_census_key" )
as.data.frame( impute_ethnorace( input_data = subset( example_persons, select = c("first_name", "last_name", "age", "sex", "state", "county") ), year = 2010, census_key = "5e4c2b8438222753a7f4753fa78855eca73b9950" ) )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.