In caseybreen/wcensoc:

Summary: In this notebook, we create the BUNMD file by unifying the cleaned and "condensed" Numident (deaths, applications, and claims) into a single file.

The function unify_numident combines the information from the deaths, applications, and claims files into a file with one record per person: the Berkeley Unified Numident Mortality Database (BUNMD).

The functions "create_weights" create post-stratification weights to the HMD for the BUNMD Sample 1 and Sample 2.

library(here)
library(ipumsr)

source(here("R/create_bunmd.R"))
source(here("R/create_weights_bunmd.R"))
source(here("R/create_weights_bunmd_complete.R"))

## read in cleaned and "condensed" numident files
claims <- fread("/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/3_numident_files_cleaned_and_condensed/claims_condensed.csv")
deaths <- fread("/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/3_numident_files_cleaned_and_condensed/deaths_cleaned.csv")
applications <- fread("/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/3_numident_files_cleaned_and_condensed/apps_condensed.csv")

## combine records into one file
bunmd <- create_bunmd(claims = claims, applications = applications, deaths = deaths)

## construct weights for BUNMD 
bunmd <- create_weights_bunmd(bunmd)
bunmd <- create_weights_bunmd_complete(bunmd)

## create string variables 
ddi_extract <- read_ipums_ddi("/global/scratch/p2p3/pl1_demography/censoc/miscellaneous/ipums_1940_extract")

## extract geo codes
geo_codes <- ipums_val_labels(ddi_extract, BPLD) 

## join geocodes for birthplace 
bunmd <- bunmd %>% 
  left_join(geo_codes %>% 
               select(bpl = val, bpl_string = lbl)) %>% 
  left_join(geo_codes %>% 
               select(socstate = val, socstate_string = lbl))

## write out BUNMD file
fwrite(bunmd, "/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/4_berkeley_unified_mortality_database/bunmd.csv")

Recap of the decision rules we used to create the BUNMD:

sex: When available, select the person’s sex from their death record. If unavailable in their death record, select sex from their most recent application record. If unavailable in their death record and their application records, select sex from their most recent claim record.
race_first: Select the person’s race_first from their most first application record.
race_last: Select the person’s race_last from their last recent application record.
bpl: Select the person’s race from their most recent application record. If unavailable in their application records, select from their most recent claim record.
father_fname: Select the person’s father’s first name that is the maximum number of characters across applications.
father_mname: Select the person’s father’s middle name that is the maximum number of characters across applications.
father_lname: Select the person’s father’s last name that is the maximum number of characters across applications.
mother_fname: Select the person’s mother’s first name that is the maximum number of characters across applications.
mother_mname: Select the person’s mother’s middle name that is the maximum number of characters across applications.
mother_lname: Select the person’s mother’s last name that is the maximum number of characters across applications.

caseybreen/wcensoc documentation built on June 11, 2025, 4:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

caseybreen/wcensoc

In caseybreen/wcensoc:

R Package Documentation

Browse R Packages

We want your feedback!