View source: R/merge_surnames.R
| merge_surnames | R Documentation | 
merge_surnames merges surnames in user-input dataset with corresponding
race/ethnicity probabilities from U.S. Census Surname List and Spanish Surname List.
merge_surnames(
  voter.file,
  surname.year = 2020,
  name.data,
  clean.surname = TRUE,
  impute.missing = TRUE
)
voter.file | 
 An object of class   | 
surname.year | 
 An object of class   | 
name.data | 
 An object of class   | 
clean.surname | 
 A   | 
impute.missing | 
 A   | 
This function allows users to match surnames in their dataset with the U.S. Census Surname List (from 2000 or 2010) and Spanish Surname List to obtain Pr(Race | Surname) for each of the five major racial groups.
By default, the function matches surnames to the Census list as follows:
Search raw surnames in Census surname list;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., Jr) and search again;
Split double-barreled surnames into two parts and search first part of name;
Split double-barreled surnames into two parts and search second part of name;
For any remaining names, impute probabilities using distribution for all names not appearing on Census list.
Each step only applies to surnames not matched in a previous ste.
Steps 2 through 7 are not applied if clean.surname is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
Output will be an object of class data.frame. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi for White, p_bla for Black,
p_his for Hispanic/Latino,
p_asi for Asian and Pacific Islander, and
p_oth for Other/Mixed).
#'
data(voters)
## Not run: try(merge_surnames(voters))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.