knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
A data package with a sample of 100,000 anonymised medical claims New Hampshire’s Comprehensive Health Information System (https://nhchis.com/).
You can install though GitHub with:
# install.packages("devtools") devtools::install_github("mhairi/medicalclaims")
Once you've loaded the package, the data is in an object called claims
. The data frame has 100,000 rows and 57 variables.
library(medicalclaims) head(claims)
Here is how you find the procedures with the highest average cost, only counting procedures that have appeared at least 10 times in the data.
library(tidyverse) claims %>% group_by(cpt_desc) %>% summarise( avg_cost = mean(total_by_n), n = n() ) %>% filter(n > 10) %>% arrange(desc(avg_cost)) %>% top_n(10, avg_cost)
If you want to look at how expensive different diagnoses are, then you first need to summarise over imputed_service_key
and icd_diag_01_primary
. This gives us the total spending for each patient and each diagnosis.
by_individual <- claims %>% group_by(new_diag_desc, imputed_service_key) %>% summarise(spending = sum(total)) %>% ungroup
Then we can summarise to find the most expensive diagnoses.
by_individual %>% group_by(new_diag_desc) %>% summarise( avg_cost = mean(spending), n = n() ) %>% filter(n > 10) %>% arrange(desc(avg_cost)) %>% top_n(10, avg_cost)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.