surnames: Ethnorace distribution over surnames

Description Usage Format Source

Description

A data set containing columns for 1) the probability of last name given ethnorace, and 2) the probability of ethnorace given last name. Last name is an uppercase character string. Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per name. This gives non-zero probability to all cells. Original data from 2010 decennial US Census.

Usage

1

Format

A data frame with 167409 rows and 13 variables:

last_name

character

pr_api_s

Probability Asian/Pacific Islander given last name

...

pr_s_api

Probability last name given Asian/Pacific Islander

...

Source

Frequently Occurring Surnames from the 2010 Census https://www.census.gov/topics/population/genealogy/data/2010_surnames.html


bwilden/bperdata documentation built on Jan. 28, 2021, 1:41 p.m.