merge_names | R Documentation |
merge_names
merges names in a user-input dataset with corresponding
race/ethnicity probabilities derived from both the U.S. Census Surname List
and Spanish Surname List and voter files from states in the Southern U.S.
merge_names(
voter.file,
namesToUse,
census.surname,
table.surnames = NULL,
table.first = NULL,
table.middle = NULL,
clean.names = TRUE,
impute.missing = FALSE,
model = "BISG"
)
voter.file |
An object of class |
namesToUse |
A character vector identifying which names to use for the prediction.
The default value is |
census.surname |
A |
table.surnames |
An object of class |
table.first |
See |
table.middle |
See |
clean.names |
A |
impute.missing |
See |
model |
See |
This function allows users to match names in their dataset with database entries estimating P(name | ethnicity) for each of the five major racial groups for each name. The database probabilities are derived from both the U.S. Census Surname List and Spanish Surname List and voter files from states in the Southern U.S.
By default, the function matches names as follows:
Search raw surnames in the database;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., "Jr") and search again (last names only)
Split double-barreled names into two parts and search first part of name;
Split double-barreled names into two parts and search second part of name;
Each step only applies to names not matched in a previous step.
Steps 2 through 6 are not applied if clean.surname
is FALSE.
Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.
Output will be an object of class data.frame
. It will
consist of the original user-input data with additional columns that
specify the part of the name matched with Census data (surname.match
),
and the probabilities Pr(Race | Surname) for each racial group
(p_whi
for White, p_bla
for Black,
p_his
for Hispanic/Latino,
p_asi
for Asian and Pacific Islander, and
p_oth
for Other/Mixed).
data(voters)
## Not run: try(merge_names(voters, namesToUse = "surname", census.surname = TRUE))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.