suppressMessages(library(tidyverse))
library(readxl)

Load and process US 1910 Census Data

The Final Report (Volume 1: Population: General Report and Analysis) from the 1910 US Census was downloaded from: https://www.census.gov/prod/www/decennial.html

The Grab Text... feature in SnagIt 2020 was used to do character recognition on Table 29: Distribution by Single Years of Age of the Population of the United States: 1910. The data was copied into Excel and observed five year subtotals were calculated and checked against the subtotals in the document.

Sheets in the workbook include the rraw data prroduced by SnagIt and the Ages data, which has the the age as an integer added and the subtotals removed.

source <- readxl::read_excel("ageBySex1910.xlsx", sheet = "Ages")
source2010 <- readxl::read_excel("ageBySex2010.xlsx", sheet = "Ages")
analysis <- source %>% 
  mutate(probMale = Male / sum(Male), probFemale = Female / sum(Female))

set.seed(1)
malesFemales1910 <- rbind(
  data.frame(sex = rep(1, 10000), # male = 1 (not my idea...)
             age = sample(analysis$Age, 10000,
                          replace = TRUE, 
                          prob = analysis$probMale)),
  data.frame(sex = rep(2, 10000), # female = 2 (also not my idea...)
             age = sample(analysis$Age, 10000, 
                          replace = TRUE, 
                          prob = analysis$probFemale)))

usethis::use_data(malesFemales1910, overwrite = TRUE)

Load and process US 2010 Census Data

I found a reference to the single year files here: https://www.census.gov/data/tables/time-series/demo/popest/2010s-national-detail.html

I went here: http://factfinder.census.gov/bkmk/table/1.0/en/PEP/2018/PEPSYASEXN?# and selected female and male datasets and copied them into Excel. I copied the July 1, 2010 column for the two sexes into the ages sheet.

source2010 <- readxl::read_excel("ageBySex2010.xlsx", sheet = "Ages")
analysis2010 <- source2010 %>% 
  mutate(probMale = Male / sum(Male), probFemale = Female / sum(Female))

set.seed(2)
malesFemales2010 <- rbind(
  data.frame(sex = rep(1, 10000), 
             age = sample(analysis2010$Age, 10000,
                          replace = TRUE, 
                          prob = analysis2010$probMale)),
  data.frame(sex = rep(2, 10000), 
             age = sample(analysis2010$Age, 10000, 
                          replace = TRUE, 
                          prob = analysis2010$probFemale)))

usethis::use_data(malesFemales2010, overwrite = TRUE)


RaymondBalise/butterfly documentation built on Dec. 27, 2019, 2:16 a.m.