In mhairi/medicalclaims: New Hampshire Health Insurance Claims

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

medicalclaims

A data package with a sample of 100,000 anonymised medical claims New Hampshire’s Comprehensive Health Information System (https://nhchis.com/).

Installation

You can install though GitHub with:

# install.packages("devtools")
devtools::install_github("mhairi/medicalclaims")

Example

Once you've loaded the package, the data is in an object called claims. The data frame has 100,000 rows and 57 variables.

library(medicalclaims)
head(claims)

Here is how you find the procedures with the highest average cost, only counting procedures that have appeared at least 10 times in the data.

library(tidyverse)

claims %>% 
  group_by(cpt_desc) %>%
  summarise(
    avg_cost = mean(total_by_n),
    n = n()
  ) %>% 
  filter(n > 10) %>% 
  arrange(desc(avg_cost)) %>% 
  top_n(10, avg_cost)

If you want to look at how expensive different diagnoses are, then you first need to summarise over imputed_service_key and icd_diag_01_primary. This gives us the total spending for each patient and each diagnosis.

by_individual <- 
claims %>% 
  group_by(new_diag_desc, imputed_service_key) %>% 
  summarise(spending = sum(total))  %>% 
  ungroup

Then we can summarise to find the most expensive diagnoses.

by_individual %>% 
  group_by(new_diag_desc) %>%
  summarise(
    avg_cost = mean(spending),
    n = n()
  ) %>% 
  filter(n > 10) %>% 
  arrange(desc(avg_cost)) %>% 
  top_n(10, avg_cost)