In caseybreen/wcensoc:

In this notebook, we take cleaned application files and claims files and "condense" them to select the best record per person.

Background: One person may have multiple applications or claims entries. As the BUNMD file will ultimately contain one record per person, this code selects the "best" value for sex, race, etc. across entries using a set of decision rules.

We only need to "condense" the application and claims files — each person only has one death record.

These functions takes a long time to run. These functions contains call on many different functions, each designed to select the "best" value among many for a given variable. Sometimes it is easiest to select only certain records.

library(here)
source(here("R/condense_numapp.R"))
source(here("R/condense_claims.R"))

## read in cleaned app file
apps <- fread("/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/2_numident_files_cleaned/apps_cleaned.csv", na.strings = "")

apps_condensed <- condense_numapp(numapp = apps)

fwrite(apps_condensed, "/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/3_numident_files_cleaned_and_condensed/apps_condensed.csv")

## read in cleaned claims file
claims <- fread("/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/2_numident_files_cleaned/claims_cleaned.csv", na.strings = "")

claims_condensed <- wcensoc::condense_claims(claims = claims)

fwrite(claims_condensed, "/global/scratch/p2p3/pl1_demography/censoc_internal/data/numident/3_numident_files_cleaned_and_condensed/claims_condensed.csv")

caseybreen/wcensoc documentation built on June 11, 2025, 4:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

caseybreen/wcensoc

In caseybreen/wcensoc:

R Package Documentation

Browse R Packages

We want your feedback!