name.clean: Name cleaning and matching function.

Description Usage Arguments Details Value Examples

View source: R/name.clean.R

Description

name.clean cleans surnames in user-input dataset and merges in racial distributions from the Census Surname List and Census Spanish Surname List.

Usage

1

Arguments

voters

An object of class data.frame. Must contain a field named 'surname'.

Details

This function allows users to match surnames in their dataset with the U.S. Census 2000 Surname List to obtain Pr(Race | Surname) for each of the five major racial groups. The function matches user-input surnames with Census surnames as follows (each step only applies to surnames not matched in previous steps): 1) match raw surnames with Census data; 2) remove any spaces and search again; 3) split apart double-barreled surnames into two names and match on first; 4) split apart double-barreled surnames into two names and match on second; 5) for any remaining names, impute probabilities from overall U.S. population. Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.

Value

Output will be an object of class data.frame. It will consist of the original user-input data with additional columns that specify the part of the name matched with Census data (surname.match), and the probabilities Pr(Race | Surname) for each racial group (p_whi for Whites, p_bla for Blacks, p_his for Hispanics/Latinos, p_asi for Asians, and p_oth for Others).

Examples

1
2

HJ08003/HJwru documentation built on May 6, 2019, 9:47 p.m.