View source: R/merge_surnames.R
merge_surnames | R Documentation |
merge_surnames
merges surnames in user-input dataset with corresponding
race/ethnicity probabilities from U.S. Census Surname List and Spanish Surname List.
merge_surnames(
voter.file,
surname.year = 2020,
name.data,
clean.surname = TRUE,
impute.missing = TRUE
)
voter.file |
An object of class |
surname.year |
An object of class |
name.data |
An object of class |
clean.surname |
A |
impute.missing |
A |
This function allows users to match surnames in their dataset with the U.S. Census Surname List (from 2000 or 2010) and Spanish Surname List to obtain Pr(Race | Surname) for each of the five major racial groups.
By default, the function matches surnames to the Census list as follows:
Search raw surnames in Census surname list;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., Jr) and search again;
Split double-barreled surnames into two parts and search first part of name;
Split double-barreled surnames into two parts and search second part of name;
For any remaining names, impute probabilities using distribution for all names not appearing on Census list.
Each step only applies to surnames not matched in a previous ste.
Steps 2 through 7 are not applied if clean.surname
is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match
),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi
for White, p_bla
for Black,
p_his
for Hispanic/Latino,
p_asi
for Asian and Pacific Islander, and
p_oth
for Other/Mixed).
#'
data(voters)
## Not run: try(merge_surnames(voters))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.