[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')
| Page| Variable | Label | |----:|:------------------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{HISTID} |Historical Unique Identifier | | \hyperlink{page.3}{3} | \hyperlink{page.3}{byear} |Year of Birth | | \hyperlink{page.4}{4} | \hyperlink{page.4}{bmonth} |Month of Birth | | \hyperlink{page.5}{5} | \hyperlink{page.5}{dyear} |Year of Death | | \hyperlink{page.6}{6} | \hyperlink{page.6}{dmonth} |Month of Death | | \hyperlink{page.7}{7} | \hyperlink{page.7}{death_age} |Age at Death (Years) | | \hyperlink{page.8}{8} | \hyperlink{page.8}{weight} |CenSoc Sample Weight | | \hyperlink{page.9}{9} | \hyperlink{page.9}{Additional IPUMS variables}| Additional 1940 Census variables, including: pernum, perwt, age, sex, bpld, mbpl, fbpl, educd, educ_yrs, empstatd, hispan, incwage, incnonwg, marst, nativity, occ, occscore, ownershp, race, rent, serial, statefip, and urban.|
\vspace{210pt}
Summary: The CenSoc-DMF Version 3.0 Demo dataset (N = 42,421) links the IPUMS 1940 1% census sample to the Death Master File (DMF) dataset, a collection of death records reported to the Social Security Administration. Records were linked using a conservative variant of the ABE method developed by Abramitzky, Boustan, and Eriksson (\textcolor{blue}{2012}, \textcolor{blue}{2014}, \textcolor{blue}{2017}).
We note that this demo dataset is not conducive to high-resolution mortality research. We recommend using this file for exploratory and demonstrative purposes. To best conduct research with CenSoc data, researchers may download the full CenSoc-DMF from the CenSoc website, obtain an extract of the full-count 1940 Census from IPUMS-USA, and merge data using on the individual-level, unique identifier HISTID variable. Please adhere to CenSoc and IPUMS citation guidelines when using this file.
\newpage
\huge HISTID \normalsize \vspace{12pt}
Label: Historical Unique Identifier
Description: HISTID is a unique individual-level identifier. It can be used to merge the CenSoc-DMF file with the 1940 Full-Count Census from IPUMS.
\newpage
\huge byear \normalsize \vspace{12pt}
Label: Birth Year
Description: byear reports a person's year of birth, as recorded in the Social Security Death Master File.
## Library Packages library(tidyverse) library(data.table) ## read in censoc_dmf_v2.1 data file censoc_dmf_v3 <-read_csv("/data/censoc/censoc_data_releases/censoc_dmf_demo/censoc_dmf_demo_v3/censoc_dmf_demo_v3.csv")
byear_plot <- censoc_dmf_v3 %>% group_by(byear) %>% summarise(n = n()) %>% ggplot(aes(x = byear, y = n)) + geom_line() + geom_point() + theme_minimal(base_size = 15) + #replace with a different theme (theme_bw()) if the bbplot package isn't downloaded ggtitle("Year of Birth") + theme(legend.position="bottom") + xlab("title") + labs(x = "Year", y = "Count") + scale_y_continuous(labels = scales::comma) + scale_x_continuous(breaks = scales::pretty_breaks(n=5))
\vspace{75pt}
byear_plot
\newpage \huge bmonth \normalsize \vspace{12pt}
Label: Birth Month
Description: bmonth reports a person's month of birth, as recorded in the Social Security Death Master File.
## run in the console and copy and paste into documentation bmonth_tabulated <- knitr::kable(censoc_dmf_v3 %>% filter(bmonth != 0) %>% group_by(bmonth) %>% tally() %>% mutate(freq = signif(n*100 / sum(n), 2)) %>% mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>% select(bmonth, label, n, `freq %` = freq))
\vspace{30pt}
bmonth_tabulated
\newpage
\huge dyear \normalsize \vspace{12pt}
Label: Death Year
Description: dyear reports a person's year of death, as recorded in the Social Security Death Master File.
dyear_plot <- censoc_dmf_v3 %>% group_by(dyear) %>% summarise(n = n()) %>% ggplot(aes(x = dyear, y = n)) + geom_line() + geom_point() + theme_minimal(base_size = 15) + ggtitle("Year of Death") + theme(legend.position="bottom") + xlab("title") + labs(x = "Year", y = "Count") + scale_y_continuous(labels = scales::comma, limits = c(0, 2000)) + scale_x_continuous(breaks = scales::pretty_breaks(n=5))
\vspace{75pt}
dyear_plot
\newpage
\huge dmonth \normalsize \vspace{12pt}
Label: Death Month
Description: dmonth reports a person's month of death, as recorded in the Social Security Death Master File.
dmonth_tabulated <- knitr::kable(censoc_dmf_v3 %>% group_by(dmonth) %>% tally() %>% mutate(freq = signif(n*100 / sum(n), 2)) %>% mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>% select(dmonth, label, n, `freq %` = freq))
\vspace{30pt}
dmonth_tabulated
\newpage
\huge death_age \normalsize \vspace{12pt}
Label: Age at Death (Years)
Description: death_age reports a person's age at death in years, calculated using the birth and death information recorded in the Social Security Death Master File.
death_age_plot <- censoc_dmf_v3 %>% group_by(death_age) %>% summarise(n = n()) %>% ggplot(aes(x = death_age, y = n)) + geom_line() + geom_point() + theme_minimal(base_size = 15) + #replace with a different theme (theme_bw()) if the bbplot package isn't downloaded ggtitle("Age at Death") + theme(legend.position="bottom") + xlab("title") + labs(x = "Age at Death", y = "Count") + scale_y_continuous(labels = scales::comma) + scale_x_continuous(breaks = scales::pretty_breaks(n=5))
\vspace{75pt}
death_age_plot
\newpage
\huge weight \normalsize
\vspace{12pt}
Label: CenSoc Sample Weight[^1]
Description: weight is a post-stratification person-weight to National Center for Health Statistics (NCHS) totals for persons (1) dying between 1975-2005 (2) dying between ages 65-100. Weights are based on age at death, year of death, sex, and race, and place of birth. Please see the \textcolor{blue}{technical documentation on weights} for more information.
[^1]: The IPUMS-USA 1940 1% sample also includes a weight (perweight
) to account for the 1940 sampling procedure (thus no weights for the 100% complete count 1940 census). For analysis, we recommend using both sets of weights. A final weight can be constructed by multiplying the two weights together.
weights_tabulated <- censoc_dmf_v3 %>% filter(!is.na(weight)) %>% summarize('Min Weight' = round(min(weight),2), 'Max Weight' = round(max(weight), 2)) %>% mutate(id = 1:n()) %>% pivot_longer(-id, names_to = "Label", values_to = "Value") %>% select(Value, Label) %>% add_row(Label = "No Weight Assigned", Value = NA) %>% knitr::kable() # weight.plot <- censoc_dmf_v3 %>% # filter(!is.na(weight)) %>% # ggplot() + # geom_boxplot(aes(x=weight), fill='grey92') + # ylim(-1,1) + # theme_minimal(15) + # theme(axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank()) + # ggtitle("Distribution of Weights") + # xlab('Weight') weights_dmf <- censoc_dmf_v3 %>% filter(!is.na(weight)) %>% group_by(death_age, dyear) %>% summarize(weight = mean(weight)) weights_lexis_dmf <- weights_dmf %>% ggplot() + geom_raster(aes(x = dyear, y = death_age, fill = weight)) + ## Lexis grid geom_hline(yintercept = seq(65, 100, 10), alpha = 0.6, lty = "dotted") + geom_vline(xintercept = seq(1985, 2005, 10), alpha = 0.6, lty = "dotted") + geom_abline(intercept = seq(-100, 100, 10)-1910, alpha = 0.6, lty = "dotted") + scale_fill_viridis_c(option = "magma") + scale_x_continuous("Year", expand = c(0.02, 0), breaks = seq(1975, 2005, 5)) + scale_y_continuous("Age", expand = c(0, 0), breaks = seq(65, 100, 10)) + guides(fill = guide_legend(reverse = TRUE)) + # coord coord_equal() + # theme theme_void() + theme( axis.text = element_text(colour = "black"), axis.text.y = element_text(size = 10), axis.text.x = element_text(size = 10, angle = 45, hjust = 0.5), plot.title = element_text(size = 10, vjust = 2), legend.text = element_text(size = 10), axis.title=element_text(size = 10,face="bold") ) + labs(X = "Year", Y = "Age", title = "Average weight by age at death and year of death")
\vspace{30pt}
weights_tabulated
\vspace{30pt}
weights_lexis_dmf
\newpage
\huge IPUMS 1940 Census Variables \normalsize
\vspace{12pt}
The variables below are from the IPUMS-USA 1940 1% census sample. We recommend looking at the terrific documentation on the IPUMS-USA website: \textcolor{blue}{https://usa.ipums.org/usa/index.shtml}
| Variable | Label | |:-------------|:---------------------------------------------| | pernum |Person number in household| | perwt |IPUMS person weight[^2] | | age |Age on April 1st, 1940| | sex |Sex | | bpld |Place of birth (detailed)| | mbpl |Mother's place of birth[^3]| | fbpl |Father's place of birth[^4]| | educd |Educational attainment (detailed IPUMS codes)| | educ_yrs |Educational attainment in years (constructed)[^5]| | empstatd |Employment status (detailed)| | hispan |Hispanic/Spanish/Latino origin (imputed)[^6]| | incwage | Wage and salary income in 1939 | | incnonwg |Had non-wage/salary income over $50 in 1939| | marst |Marital status| | nativity |Foreign birthplace or parentage| | occ |Occupation| | occscore |Occupational income score| | ownershp |Ownership of dwelling (tenure)| | race |Race| | rent |Montly contract rent| | serial |Household serial number| | statefip |State of residence 1940 (FIPS code)| | urban |Urban/rural status|
[^2]: The IPUMS perweight
accounts for the 1940 sampling procedure to construct the 1% sample, and thus is only available in the 1940 1% sample. For analysis, we recommend using both the IPUMS perweight
and the CenSoc weight.
A final weight can be constructed by multiplying the two weights together
[^3]: This variable is only available for sample-line persons (a one-in-twenty sample asked additional questions in the 1940 Census) or those living with their mother.
[^4]: This variable is only available for sample-line persons (a one-in-twenty sample asked additional questions in the 1940 Census) or those living with their father.
[^5]: educ_yrs
is constructed from the IPUMS educd
variable but not directly available from IPUMS.
[^6]: The 1940 Census did not directly inquire about Hispanic ethnicity or origin. This variable is determined by IPUMS using information such as one's birthplace or a parent's birthplace.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.