R/data.R

#' Ethnorace distribution over multi-unit housing occupancy.
#'
#' A data set containing columns for 1) the probability of apartment occupancy given
#' ethnorace, and 2) the probability of ethnorace given apartment occupancy.
#' Apartment occupancy is a binary (1/0) value. Original data from the 2000
#' decennial US Census.
#'
#' @format A data frame with 2 rows and 13 variables: \describe{
#'   \item{apartment}{numeric} \item{pr_white_a}{Probability White given
#'   apartment occupancy} ... \item{pr_a_white}{Probability apartment occupancy
#'   given White} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"apartments"

#' Ethnorace distribution over birth years.
#'
#' A data set containing columns for 1) the probability of birth year given ethnorace, and
#' 2) the probability of ethnorace given birth year. Birth year is a numeric
#' value from 1911 to 2010. Original data from 2010 decennial US Census.
#'
#' @format A data frame with 100 rows and 13 variables: \describe{
#'   \item{birth_year}{numeric} \item{pr_black_y}{Probability Black given birth
#'   year} ... \item{pr_y_black}{Probability birth year given Black} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"birth_years"

#' Ethnorace distribution over US Census blocks.
#'
#' A data set containing columns for 1) the probability of geolocation (block) given
#' ethnorace, and 2) the probability of ethnorace given geolocation (block).
#' Block is a 15 character string comprised of state code (2) + county code (3)
#' + tract code (6) + block code (4). Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per block This gives non-zero probability to all cells.  Original data from 2010 decennial US
#' Census.
#'
#' @format A data frame with 11155486 rows and 13 variables: \describe{
#'   \item{block}{character} \item{pr_aian_g}{Probability American Indian/Alaska
#'   Native given block} ... \item{pr_g_aian}{Probability block given American
#'   Indian/Alaska Native} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"blocks"

#' Ethnorace distribution over Counties.
#'
#' A data set containing columns for 1) the probability of geolocation (county) given
#' ethnorace, and 2) the probability of ethnorace given geolocation (county).
#' County is a 5-digit code also known as a FIPS code. It is made up of a state code (2) + county code (3). Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per county. This gives non-zero probability to all cells.
#'
#' @format A data frame with 3221 rows and 13 variables: \describe{
#'   \item{county}{character} \item{pr_aian_g}{Probability American Indian/Alaska
#'   Native given county} ... \item{pr_g_aian}{Probability county given American
#'   Indian/Alaska Native} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"counties"

#' Ethnorace distribution over first names.
#'
#' A data set containing columns for 1) the probability of first name given ethnorace, and
#' 2) the probability of ethnorace given first name. First name is an uppercase
#' character string. Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per name This gives non-zero probability to all cells.
#'
#' @format A data frame with 4251 rows and 13 variables: \describe{
#'   \item{birth_year}{numeric} \item{pr_hispanic_f}{Probability Hispanic given
#'   first name} ... \item{pr_f_hispanic}{Probability first name given Hispanic}
#'   ... }
#'
#' @source Tzioumis, Konstantinos (2018) Demographic aspects of first names,
#'   Scientific Data, 5:180025 dx.doi.org/10.1038/sdata.2018.25.
#'   \url{https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TYJKEZ}
#'
"firstnames"

#' Ethnorace distribution over genders.
#'
#' A data set containing columns for 1) the probability of gender given ethnorace, and 2)
#' the probability of ethnorace given gender. Gender is a binary numeric value
#' indicating female or not female. Original data from 2010 decennial US Census.
#'
#' @format A data frame with 2 rows and 13 variables: \describe{
#'   \item{female}{numeric} \item{pr_api_fem}{Probability Asian/Pacific Islander
#'   given gender} ... \item{pr_fem_api}{Probability gender given Asian/Pacific
#'   Islander} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"genders"

#' Ethnorace distribution over political parties.
#'
#' A data set containing columns for 1) the probability of birth year given ethnorace, and
#' 2) the probability of ethnorace given birth year. Party is a string
#' character. Original data from 2012 Gallup.
#'
#' @format A data frame with 3 rows and 13 variables: \describe{
#'   \item{birth_year}{numeric} \item{pr_black_p}{Probability Black given party}
#'   ... \item{pr_p_black}{Probability party given Black} ... }
#'
#' @source Gallup
#'   \url{https://news.gallup.com/poll/160373/democrats-racially-diverse-republicans-mostly-white.aspx}
#'
"parties"

#' Ethnorace distribution over States
#'
#' A data set containing columns for 1) the probability of geolocation (State) given
#' ethnorace, and 2) the probability of ethnorace given geolocation (State).
#' States are identified by two-letter abbreviation.
#'
#' @format A data frame with 52 rows and 13 variables: \describe{
#'   \item{state}{character} \item{pr_aian_g}{Probability American Indian/Alaska
#'   Native given State} ... \item{pr_g_aian}{Probability State given American
#'   Indian/Alaska Native} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"states"

#' Ethnorace distribution over surnames
#'
#' A data set containing columns for 1) the probability of last name given ethnorace, and 2)
#' the probability of ethnorace given last name. Last name is an uppercase
#' character string. Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per name. This gives non-zero probability to all cells. Original data from 2010 decennial US Census.
#'
#' @format A data frame with 167409 rows and 13 variables: \describe{
#'   \item{last_name}{character} \item{pr_api_s}{Probability Asian/Pacific
#'   Islander given last name} ... \item{pr_s_api}{Probability last name given
#'   Asian/Pacific Islander} ... }
#'
#' @source Frequently Occurring Surnames from the 2010 Census
#'   \url{https://www.census.gov/topics/population/genealogy/data/2010_surnames.html}
#'
"surnames"

#' Ethnorace distribution over ZIP Codes.
#'
#' A data set containing columns for 1) the probability of geolocation (ZIP Code) given
#' ethnorace, and 2) the probability of ethnorace given geolocation (ZIP Code).
#' Zip is a 5 character string. Laplace smoothing has been applied to this data set, meaning that 1 has been added to each ethnorace category per ZIP Code. This gives non-zero probability to all cells. Original data from 2010 decennial US Census.
#'
#' @format A data frame with 11155486 rows and 13 variables: \describe{
#'   \item{zip}{character} \item{pr_aian_g}{Probability American Indian/Alaska
#'   Native given ZIP Code} ... \item{pr_g_aian}{Probability ZIP Code given
#'   American Indian/Alaska Native} ... }
#'
#' @source IPUMS NHGIS, University of Minnesota \url{www.nhgis.org}
"zips"
bwilden/bperdata documentation built on Jan. 28, 2021, 1:41 p.m.