merge_names: Surname probability merging function.
In wru: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

merge_names

R Documentation

Surname probability merging function.

Description

merge_names merges names in a user-input dataset with corresponding race/ethnicity probabilities derived from both the U.S. Census Surname List and Spanish Surname List and voter files from states in the Southern U.S.

Usage

merge_names(
  voter.file,
  namesToUse,
  census.surname,
  table.surnames = NULL,
  table.first = NULL,
  table.middle = NULL,
  clean.names = TRUE,
  impute.missing = FALSE,
  model = "BISG"
)

Arguments

`voter.file`	An object of class `data.frame`. Must contain a row for each individual being predicted, as well as a field named `last` containing each individual's surname. If first name is also being used for prediction, the file must also contain a field named `first`. If middle name is also being used for prediction, the field must also contain a field named `middle`.
`namesToUse`	A character vector identifying which names to use for the prediction. The default value is `"last"`, indicating that only the last name will be used. Other options are `"last, first"`, indicating that both last and first names will be used, and `"last, first, middle"`, indicating that last, first, and middle names will all be used.
`census.surname`	A `TRUE`/`FALSE` object. If `TRUE`, function will call `merge_surnames` to merge in Pr(Race \| Surname) from U.S. Census Surname List (2000, 2010, or 2020) and Spanish Surname List. If `FALSE`, user must provide a `name.dictionary` (see below). Default is `TRUE`.
`table.surnames`	An object of class `data.frame` provided by the users as an alternative surname dictionary. It will consist of a list of U.S. surnames, along with the associated probabilities P(name \| ethnicity) for ethnicities: white, Black, Hispanic, Asian, and other. Default is `NULL`. (`last_name` for U.S. surnames, `p_whi_last` for White, `p_bla_last` for Black, `p_his_last` for Hispanic, `p_asi_last` for Asian, `p_oth_last` for other).
`table.first`	See `table.surnames`.
`table.middle`	See `table.surnames`.
`clean.names`	A `TRUE`/`FALSE` object. If `TRUE`, any surnames in `voter.file` that cannot initially be matched to the database will be cleaned, according to U.S. Census specifications, in order to increase the chance of finding a match. Default is `TRUE`.
`impute.missing`	See `predict_race`.
`model`	See `predict_race`.

Details

This function allows users to match names in their dataset with database entries estimating P(name | ethnicity) for each of the five major racial groups for each name. The database probabilities are derived from both the U.S. Census Surname List and Spanish Surname List and voter files from states in the Southern U.S.

By default, the function matches names as follows:

Search raw surnames in the database;
Remove any punctuation and search again;
Remove any spaces and search again;
Remove suffixes (e.g., "Jr") and search again (last names only)
Split double-barreled names into two parts and search first part of name;
Split double-barreled names into two parts and search second part of name;

Each step only applies to names not matched in a previous step. Steps 2 through 6 are not applied if clean.surname is FALSE.

Note: Any name appearing only on the Spanish Surname List is assigned a probability of 1 for Hispanics/Latinos and 0 for all other racial groups.

Value

Output will be an object of class data.frame. It will consist of the original user-input data with additional columns that specify the part of the name matched with Census data (surname.match), and the probabilities Pr(Race | Surname) for each racial group (p_whi for White, p_bla for Black, p_his for Hispanic/Latino, p_asi for Asian and Pacific Islander, and p_oth for Other/Mixed).

Examples

data(voters)
## Not run: try(merge_names(voters, namesToUse = "surname", census.surname = TRUE))

wru documentation built on May 29, 2024, 9:46 a.m.

wru index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wru
Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

merge_names: Surname probability merging function.
In wru: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

Surname probability merging function.

Description

Usage

Arguments

Details

Value

Examples

Related to merge_names in wru...

R Package Documentation

Browse R Packages

We want your feedback!

wru Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

merge_names: Surname probability merging function. In wru: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

Surname probability merging function.

Description

Usage

Arguments

Details

Value

Examples

Related to merge_names in wru...

R Package Documentation

Browse R Packages

We want your feedback!

wru
Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation

merge_names: Surname probability merging function.
In wru: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation