knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

bper: Bayesian Prediction for Ethnicity and Race

This package provides functions for imputing an individual's race/ethnicity given their first name, last name, geolocation, political party, age, gender, and address characteristics. The method is based on a Naive Bayes classification algorithm which incorporates known ethnorace distributions over the observed characteristics to generate posterior probabilities for each individual.

Installation

You can install the development version with:

# install.packages("devtools")
devtools::install_github("bwilden/bper")

Usage

The data necessary for the imputation algorithm come from the US Census API. You will need a Census API key in order to use the package (https://api.census.gov/data/key_signup.html).

library(bper)

example_persons

Step 1: Prepare input data

The bper package works by matching Census data to column names in your input data frame. To ensure all possible information is used in the imputation algorithm, reformat your variables in the following way:

Not all of the above variables are required (only the most detailed geographic variable is used), but more information leads to better predictions.

Step 2: Pre-load Census data (optional)

The function load_bper_data allows you to load the Census data necessary for the imputation algorithm ahead of time. This can be a big time-saver particularly when using detailed geographic inputs (tracts or blocks).

my_bper_data <- load_bper_data(
  input_data = example_persons,
  year = 2010,
  census_key = "your_census_key"
)

save(my_bper_data, file = "my_bper_data.rda")

Step 3: Impute race/ethnicity

Use impute_ethnorace to implement the race/ethnicity imputation algorithm.

# Method 1, loading Census data and imputing ethnorace
imputed_data <- impute_ethnorace(
  input_data = example_persons,
  year = 2010,
  census_key = "your_census_key"
)

# Method 2, using pre-loaded Census data
imputed_data <- impute_ethnorace(
  input_data = example_persons,
  bper_data = my_bper_data,
  year = 2010,
  census_key = "your_census_key"
)

Final result:

as.data.frame(
  impute_ethnorace(
    input_data = subset(
      example_persons,
      select = c("first_name",
                 "last_name",
                 "age",
                 "sex",
                 "state",
                 "county")
    ),
    year = 2010,
    census_key = "5e4c2b8438222753a7f4753fa78855eca73b9950"
  )
)


bwilden/bper documentation built on March 25, 2023, 3:39 p.m.