[^updated]: Last updated: r format(Sys.time(), '%d %B, %Y')

| Page| Variable | Label | |----:|:-------------|:---------------------------------------------| | \hyperlink{page.2}{2} | \hyperlink{page.2}{HISTID} |Historical unique identifier | | \hyperlink{page.3}{3} | \hyperlink{page.3}{byear_dmf} |Year of birth | | \hyperlink{page.4}{4} | \hyperlink{page.4}{bmonth_dmf} |Month of birth | | \hyperlink{page.5}{5} |\hyperlink{page.5}{dyear_dmf} |Year of death | | \hyperlink{page.6}{6} | \hyperlink{page.6}{dmonth_dmf} |Month of death | | \hyperlink{page.7}{7} |\hyperlink{page.7}{death_age_dmf} |Age at death (years) | | \hyperlink{page.8}{8} | \hyperlink{page.8}{sex_dmf} |Sex | | \hyperlink{page.9}{9} |\hyperlink{page.9}{residence_state_enlistment} |State of residence at enlistment | | \hyperlink{page.10}{10} |\hyperlink{page.10}{residence_county_enlistment} |County of residence at enlistment| | \hyperlink{page.11}{11} |\hyperlink{page.11}{place_of_enlistment_enlistment} |Place of enlistment| | \hyperlink{page.12}{12} |\hyperlink{page.12}{date_of_enlistment_enlistment} |Date of enlistment | | \hyperlink{page.13}{13} |\hyperlink{page.13}{grade_code_enlistment} |Army grade (rank)| | \hyperlink{page.14}{14} |\hyperlink{page.14}{branch_code_enlistment} |Army branch | | \hyperlink{page.15}{15} |\hyperlink{page.15}{term_of_enlistment_enlistment} |Term of enlistment | | \hyperlink{page.16}{16} |\hyperlink{page.16}{source_enlistment} |Source of army personnel | | \hyperlink{page.17}{17} |\hyperlink{page.17}{race_and_citizenship_enlistment} |Race and citizenship | | \hyperlink{page.18}{18} |\hyperlink{page.18}{education_enlistment} |Education | | \hyperlink{page.19}{19} |\hyperlink{page.19}{civ_occupation_enlistment} |Civilian occupation | | \hyperlink{page.20}{20} |\hyperlink{page.20}{marital_status_enlistment} |Marital status at enlistment | | \hyperlink{page.21}{21} |\hyperlink{page.21}{height_enlistment} |Height at enlistment (inches) | | \hyperlink{page.22}{22} |\hyperlink{page.22}{weight_or_AGCT_enlistment} |Weight (pounds) at enlistment or AGCT score| | \hyperlink{page.23}{23} |\hyperlink{page.23}{component_enlistment} |Army component |

Summary: The CenSoc-DMF-WW2-Army-Enlistment File (N = 1,541,899) links the full-count 1940 Census to the Social Security Death Master File (DMF) and the National Archives’ public release of the United States World War II Army Enlistment Records, circa 1938-1946. Records were linked using the standard variant of the ABE method developed by Abramitzky, Boustan, and Eriksson (2012, 2014, 2017). Variables ending in '_dmf' are from the DMF, and those ending in '_enlistment' are from the Army Enlistment file. To merge 1940 Census data with this data set, researchers can obtain a copy of the 1940 Census from IPUMS-USA and link on the individual-level, unique identifier HISTID variable.

knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
options(kableExtra.auto_format = FALSE, scipen=999)
## Library Packages
library(tidyverse)
library(data.table)
library(stringi)
library(kableExtra)

data_path <- '/data/josh/CenSoc/data_release_military_records'
resource_file_path <- '/hdir/0/mariaosborne/censoc/army enlistment files/'

## read in data file
#setwd(data_path)
#enlistment_dmf <- fread('censoc_dmf_ww2_army_enlistment_v1.csv')


setwd(resource_file_path)
test_file_path <- '/hdir/0/mariaosborne/censoc/army enlistment files/'
enlistment_dmf <- fread('censoc_dmf_enlistment_v1_copy.csv')

\newpage

\huge HISTID \normalsize \vspace{12pt}

Label: Historical unique identifier

Description: HISTID is a unique individual-level identifier. It can be used to merge the CenSoc-DMF-WW2-Army-Enlistment file with the 1940 Full-Count Census from IPUMS.

\newpage

\huge byear_dmf \normalsize \vspace{12pt}

Label: Birth year

Description: byear_dmf reports a person's year of birth, as recorded in the Social Security Death Master File.

byear_plot <- enlistment_dmf %>%
    group_by(byear_dmf) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = byear_dmf, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) + #replace with a different theme (theme_bw()) if the bbplot package isn't downloaded 
  ggtitle("Year of Birth") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma, limits = c(0, 110000)) +
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

byear_plot

\newpage \huge bmonth_dmf \normalsize \vspace{12pt}

Label: Birth month

Description: bmonth_dmf reports a person's month of birth, as recorded in the Social Security Death Master File.

## run in the console and copy and paste into documentation
bmonth_tabulated <- knitr::kable(enlistment_dmf %>%
    group_by(bmonth_dmf) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(bmonth_dmf, label, n, `freq %` = freq))

\vspace{30pt}

bmonth_tabulated

\newpage

\huge dyear_dmf \normalsize \vspace{12pt}

Label: Death year

Description: dyear_dmf reports a person's year of death, as recorded in the Social Security Death Master File.

dyear_plot <- enlistment_dmf %>%
    group_by(dyear_dmf) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = dyear_dmf, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) +  
  ggtitle("Year of Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Year", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) +
  ylim(0, 62000)

\vspace{75pt}

dyear_plot

\newpage

\huge dmonth_dmf \normalsize \vspace{12pt}

Label: Death month

Description: dmonth_dmf reports a person's month of death, as recorded in the Social Security Death Master File.

dmonth_tabulated <- knitr::kable(enlistment_dmf %>%
    group_by(dmonth_dmf) %>%
    tally() %>%
    mutate(freq = signif(n*100 / sum(n), 2)) %>%
    mutate(label = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")) %>%
    select(dmonth_dmf, label, n, `freq %` = freq))

\vspace{30pt}

dmonth_tabulated

\newpage \huge death_age_dmf \normalsize \vspace{12pt}

Label: Age at death (years)

Description: death_age_dmf reports a person's age at death in years, calculated using the birth and death information recorded in the Social Security Death Master File.

death_age_plot <- enlistment_dmf %>%
    group_by(death_age_dmf) %>%
    summarise(n = n()) %>%
    ggplot(aes(x = death_age_dmf, y = n)) + 
    geom_line() + 
    geom_point() +
    theme_minimal(base_size = 15) + #replace with a different theme (theme_bw()) if the bbplot package isn't downloaded 
  ggtitle("Age at Death") + 
  theme(legend.position="bottom") +
  xlab("title") + 
  labs(x = "Age at Death", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(breaks = scales::pretty_breaks(n=5)) 

\vspace{75pt}

death_age_plot

\newpage

\huge sex_dmf \normalsize \vspace{12pt}

Label: Sex

Description: sex_dmf reports a person's sex, as recorded in the Social Security Death Master File.

sex_tabulated <- knitr::kable(enlistment_dmf %>%
    group_by(sex_dmf) %>%
     tally() %>%
     mutate(freq = signif(n*100 / sum(n), 3)) %>%
     mutate(label = c("Men")) %>%
     select(sex_dmf, label, n, `freq %` = freq))

\vspace{30pt}

sex_tabulated

\newpage

\huge residence_state_enlistment \normalsize \vspace{12pt}

Label: State of residence

Description: residence_state_enlistment is an string variable that reports a person's state of residence at time of enlistment, as reported in the World War II Army Enlistment Records. "State" can refer to a US state or a foreign nation. This variable may encode additional information about a person's enlistment, including conscientious objector or limited service status.

For a complete list of codes, please see: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2046&dt=893&c_id=24996

\newpage

\huge residence_county_enlistment \normalsize \vspace{12pt}

Label: County of residence

Description: residence_county_enlistment is a numeric variable that reports a person's county of residence at time of enlistment, as reported in the World War II Army Enlistment Records. This variable is only available for enlistees residing in the US.

For a complete list of codes, please see: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2057&dt=893&c_id=24998

\newpage

\huge place_of_enlistment \normalsize \vspace{12pt}

Label: Place of enlistment

Description: place_of_enlistment_enlistment is a numeric variable that reports where an person enlisted, as reported in the World War II Army Enlistment Records.

For a complete list of codes, please see: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=4123&dt=893&c_id=24997

\newpage

\huge date_of_enlistment_enlistment \normalsize \vspace{12pt}

Label: Date of enlistment, yyyy-mm-dd format

Description: date_of_enlistment_enlistment reports a person's date of enlistment in yyyy-mm-dd format, as reported in the World War II Army Enlistment Records.

enlist_date_plot <- enlistment_dmf %>%
  mutate(date_short = format(as.Date(date_of_enlistment_enlistment), "%Y-%m")) %>% 
    group_by(date_short) %>% 
    tally() %>% 
    filter(!is.na(date_short)) %>%
  ggplot(aes(x=date_short, y=n, group=1)) +
  geom_line() +
  geom_point() +
  scale_x_discrete(breaks = paste0(seq(1938, 1947, 1), "-01")) +
  theme_minimal(base_size = 15) +
  ggtitle("Enlistment Date (Month and Year Only)") +
  labs(x = "Date (yyyy-mm)", 
       y = "Count") + 
  scale_y_continuous(labels = scales::comma) +
  theme(axis.text.x = element_text(angle = 45))

\vspace{75pt}

enlist_date_plot

\newpage

\huge grade_code_enlistment \normalsize \vspace{12pt}

Label: Army grade (alphanumeric code)

Description: grade_code_enlistment is a string variable that records a person's grade (rank), as reported in the World War II Army Enlistment Records.

grades <- fread('nara_grade_code_codes.csv')
grade_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(grade_code_enlistment) %>% 
                          tally() %>% 
                          left_join(grades, by=c('grade_code_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(grade_code_enlistment, 'label' = Meaning, n, `freq %` = freq))

\vspace{30pt}

grade_tabulated

Code list also is available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2072&dt=893&c_id=24999

\newpage

\huge branch_code_enlistment \normalsize \vspace{12pt}

Label: Army branch (alphanumeric code)

Description: branch_code_enlistment is a numeric variable that records the army branch in which the enlistee served, as reported in the World War II Army Enlistment Records.

branches <- fread('nara_branch_code_codes.csv')
branch_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(branch_code_enlistment) %>% 
                          tally() %>% 
                          left_join(branches, by=c('branch_code_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select('branch_code'= branch_code_enlistment, 'label' = Meaning, n, `freq %` = freq),
                          format='latex', booktabs=T) %>% 
  kable_styling(position = "center") %>% 
  column_spec(column=2, width = '9cm')

\vspace{10 pt}

# check type of codes vs. enlistment data -- may be mismatched
branch_tabulated

\newpage

\huge term_of_enlistment_enlistment \normalsize \vspace{12pt}

Label: Term of enlistment

Description: term_of_enlistment_enlistment is a numeric variable that reports the length of a person's enlistment or their department of enlistment, as reported in the World War II Army Enlistment Records.

term <- fread('nara_term_of_enlistment.csv')
toe_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(term_of_enlistment_enlistment) %>% 
                          tally() %>% 
                          left_join(term, by=c('term_of_enlistment_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(term_of_enlistment_enlistment, 'label' = Meaning, n, `freq %` = freq),
                          format='latex', booktabs=T) %>% 
  kable_styling(position = "center") %>% 
  column_spec(column=2, width = '7cm')

\vspace{30 pt}

toe_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2076&dt=893&c_id=24979

\newpage

\huge source_enlistment \normalsize \vspace{12pt}

Label: Source of army personnel

Description: source_enlistment is a string variable that reports the source of the enlisted person, as reported in the World War II Army Enlistment Records.

source_codes <- fread('nara_source_codes.csv')
source_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(source_enlistment) %>% 
                          tally() %>% 
                          left_join(source_codes, by=c('source_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(source_enlistment, 'label' = Meaning, n, `freq %` = freq),
                          format='latex', booktabs=T) %>% 
  kable_styling(position = "center") %>% 
  column_spec(column=2, width = '7cm')

\vspace{30 pt}

source_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2077&dt=893&c_id=24981

\newpage

\huge race_and_citizenship_enlistment \normalsize \vspace{12pt}

Label: Race and Citizenship

Description: race_and_citizenship_enlistment is a string variable that reports an individual's race and citizenship, as reported in the World War II Army Enlistment Records.

races <- fread('nara_race_codes.csv')
race_cit_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(race_and_citizenship_enlistment) %>% 
                          tally() %>% 
                          left_join(races, by=c('race_and_citizenship_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>%
                            select(race_and_citizenship_enlistment, 'label' = Meaning, n, `freq %` = freq))

\vspace{30 pt}

race_cit_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2059&dt=893&c_id=24984

\newpage

\huge education_enlistment \normalsize \vspace{12pt}

Label: education

Description: education_enlistment is a numeric variable that reports an individual's educational attainment at time of enlistment, as reported in the World War II Army Enlistment Records.

educ_codes <- fread('nara_education_codes.csv')
educ_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(education_enlistment) %>% 
                          tally() %>% 
                          left_join(educ_codes, by=c('education_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(education_enlistment, 'label' = Meaning, n, `freq %` = freq))

\vspace{30 pt}

educ_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2058&dt=893&c_id=24985

\newpage

\huge civ_occupation_enlistment \normalsize \vspace{12pt}

Label: civilian occupation

Description: civ_occupation_enlistment is a string variable that reports a person's occupation at time of enlistment, as reported in the World War II Army Enlistment Records.

For complete codes, please see: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=3323&dt=893&c_id=24986

\newpage

\huge marital_status_enlistment \normalsize \vspace{12pt}

Label: marital status

Description: marital_status_enlistment is a string variable that reports a person's marital status at time of enlistment, as reported in the World War II Army Enlistment Records.

ms_codes <- fread('nara_marital_status_codes.csv')
marital_status_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(marital_status_enlistment) %>% 
                          tally() %>% 
                          left_join(ms_codes, by=c('marital_status_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(marital_status_enlistment, 'label' = Meaning, n, `freq %` = freq))

\vspace{30pt}

marital_status_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2081&dt=893&c_id=25001

\newpage \huge height_enlistment \normalsize \vspace{12pt}

Label: Height

Description: height_enlistment is a numeric variable that reports a person's height in inches at time of enlistment, as reported in the World War II Army Enlistment Records.

Note: Please be aware that instructions on how to use this field on enlistment punch cards varied over time. This field may encode military occupation or other information. Height may have been recorded through about 1943, but height data recorded after February 1943 are likely unreliable. See the \textcolor{blue}{National Archives' technical documentation on the Amy Serial Number Electronic File} for further information.

height_plot <- enlistment_dmf %>%
  ggplot(aes(x = height_enlistment)) +
  geom_histogram(breaks = seq(0,100,1)) +
  theme_minimal(base_size = 15) +
  ggtitle("Height") +
  scale_y_continuous(labels = scales::comma)

\vspace{75pt}

height_plot

\newpage \huge weight_or_AGCT_enlistment \normalsize \vspace{12pt}

Label: Weight (pounds) at enlistment or AGCT score

Description: weight_or_AGCT_enlistment is a numeric variable that reports a person's weight in pounds at time of enlistment or their Army General Classification Test (AGCT) score, as reported in the World War II Army Enlistment Records.

Note: Please be aware that instructions on how to use this field on enlistment punch cards varied over time. This field was explicitly used for both weight and AGCT scores, but may also encode military occupation or other information. Weight may have been recorded through about 1943 at some enlistment sites, but the recording of AGCT score instead of weight at some enlistment sites likely began in March 1943. Refer to the \textcolor{blue}{National Archives' technical documentation on the Amy Serial Number Electronic File} for further information.

weight_plot <- enlistment_dmf %>%
  ggplot(aes(x = weight_or_AGCT_enlistment)) +
  geom_histogram(breaks = seq(0,1000,10)) +
  theme_minimal(base_size = 15) +
  ggtitle("Weight or AGCT Score") +
  scale_y_continuous(labels = scales::comma)

\vspace{75pt}

weight_plot

\newpage \huge component_enlistment \normalsize \vspace{12pt}

Label: Army Component

Description: component_enlistment is a numeric variable that records the component of the army in which an individual served, as recorded in the World War II Army Enlistment Records.

component_codes <- fread('nara_component_codes.csv')
component_tabulated <- knitr::kable(enlistment_dmf %>%
                          group_by(component_enlistment) %>% 
                          tally() %>% 
                          left_join(component_codes, by=c('component_enlistment' = 'Code')) %>%
                          mutate(freq = signif(n*100 / sum(n), 3)) %>% 
                          select(component_enlistment, 'label' = Meaning, n, `freq %` = freq),
                          format='latex', booktabs=T) %>% 
  kable_styling(position = "center") %>% 
  column_spec(column=2, width = '7cm')

\vspace{30 pt}

component_tabulated

Codes also available at: https://aad.archives.gov/aad/popup-codelist.jsp?cl_id=2078&dt=929&c_id=25014



caseybreen/wcensoc documentation built on Nov. 21, 2024, 5:15 a.m.