In jfangmeier/claimsdb: Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

claimsdb

claimsdb provides easy access to a sample of health insurance enrollment and claims data from the CMS Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF), as a set of relational tables or as an in-memory database using DuckDB. This package is inspired by and based on the starwarsdb package.

The data are structured as actual Medicare claims data but are fully "synthetic," after a process of alterations meant to reduce the risk of re-identification of real Medicare beneficiaries. The synthetic process that CMS adopted changes the co-variation across variables, so analysts should be cautious about drawing inferences about the actual Medicare population.

The data included in claimsdb comes from 500 randomly selected 2008 Medicare beneficiaries from Sample 2 of the DE-SynPUF, and it includes all the associated claims for these members for 2008-2009. CMS provides resources, including a codebook, FAQs, and other documents with more information about this data.

Formats: Metadata, Local Tables, Remote Database Tables

Installation

You can install the development version of claimsdb from GitHub with:

# install.packages("devtools")
devtools::install_github("jfangmeier/claimsdb")

Health Insurance Claims Data

library(tibble)

All of the tables are available when you load claimsdb

library(dplyr)
library(claimsdb)

The tables are also available by loading just the data from claimsdb

data(package = "claimsdb")

This includes a schema table that describes each of the tables and the included variables from the CMS DE-SynPUF.

schema

schema %>% 
  filter(TABLE == "inpatient") %>% 
  pull(PROPERTIES)

Local Tables

Using the sample data in the tables, you can ask questions such as, what were the average and median prescription drug costs for males and females in 2008 and 2009?

rx_costs <- pde %>% 
  mutate(BENE_YEAR = lubridate::year(SRVC_DT)) %>%  
  group_by(BENE_YEAR, DESYNPUF_ID) %>% 
  summarize(TOTAL_RX_COST = sum(TOT_RX_CST_AMT, na.rm = T), .groups = "drop")

bene %>% 
  transmute(
    BENE_YEAR,
    DESYNPUF_ID,
    BENE_SEX_IDENT = case_when(
      BENE_SEX_IDENT_CD == "1" ~ "Male",
      TRUE ~ "Female")
  ) %>% 
  left_join(
    rx_costs, by = c("BENE_YEAR", "DESYNPUF_ID")
  ) %>% 
  mutate(TOTAL_RX_COST = ifelse(is.na(TOTAL_RX_COST), 0, TOTAL_RX_COST)) %>% 
  group_by(BENE_YEAR, BENE_SEX_IDENT) %>% 
  summarize(
    MEAN_RX_COST = mean(TOTAL_RX_COST, na.rm = T),
    MEDIAN_RX_COST = median(TOTAL_RX_COST, na.rm = T))

Remote Database Tables

Many organizations store claims data in a remote database, so claimsdb also includes all of the tables as an in-memory DuckDB database. This can be a great way to practice working with this type of data, including building queries with dplyr code using dbplyr.

con <- claims_connect()
bene_rmt <- tbl(con, "bene")
pde_rmt <- tbl(con, "pde")
pde_rmt

rx_costs_rmt <- pde_rmt %>% 
  # note below that lubridate functions do not currently work on remote databases,
  # so you need to use date/time functions appropriate for the database.
  mutate(BENE_YEAR = date_part('year', SRVC_DT)) %>%  
  group_by(BENE_YEAR, DESYNPUF_ID) %>% 
  summarize(TOTAL_RX_COST = sum(TOT_RX_CST_AMT, na.rm = T), .groups = "drop") %>% 
  ungroup()
rx_costs_rmt

bene_rmt %>% 
  transmute(
    BENE_YEAR,
    DESYNPUF_ID,
    BENE_SEX_IDENT = case_when(
      BENE_SEX_IDENT_CD == "1" ~ "Male",
      TRUE ~ "Female")
  ) %>% 
  left_join(
    rx_costs_rmt, by = c("BENE_YEAR", "DESYNPUF_ID")
  ) %>% 
  mutate(TOTAL_RX_COST = ifelse(is.na(TOTAL_RX_COST), 0, TOTAL_RX_COST)) %>% 
  group_by(BENE_YEAR, BENE_SEX_IDENT) %>% 
  summarize(
    MEAN_RX_COST = mean(TOTAL_RX_COST, na.rm = T),
    MEDIAN_RX_COST = median(TOTAL_RX_COST, na.rm = T))

jfangmeier/claimsdb documentation built on Feb. 14, 2022, 6:01 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jfangmeier/claimsdb
Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching

In jfangmeier/claimsdb: Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching

claimsdb

Installation

Health Insurance Claims Data

Local Tables

Remote Database Tables

R Package Documentation

Browse R Packages

We want your feedback!

jfangmeier/claimsdb Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching

In jfangmeier/claimsdb: Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching

claimsdb

Installation

Health Insurance Claims Data

Local Tables

Remote Database Tables

R Package Documentation

Browse R Packages

We want your feedback!

jfangmeier/claimsdb
Relational Data from the CMS Data Entrepreneurs Synthetic PUF for Learning and Teaching