library(rads) library(data.table)
pretty_kable <- function(dt) { knitr::kable(dt, format = 'markdown') }
Many PHSKC metrics are simple counts or means/proportions/fractions. The rads::calc()
function was created to calculate these metrics from survey and line-level data. However, there are certain metrics and datasets (e.g., death & CHARS) where true rates -- with a population denominator -- are of interest. Most of the time these rates need to be age-adjusted and most of the time DOH and APDE adjust to the US 2000 Standard Population with 11 age groups (see rads.data::population_reference_pop_11_age_groups
). The rads package has a suite of tools that were developed to facilitate these types of analyses. This vignette explains how to use these tools in four steps.
STEP 1: Create a summary table of event counts (deaths, hospitalizations, etc.) by single year age groups and strata of interest. For example, part of your your table might look like this:
| event_name | counts | age | year | region | |:-----------|:-------|:----|:-----|:--------| | CVD death | 26 | 47 | 2021 | Seattle | | CVD death | 17 | 47 | 2021 | North | | CVD death | 28 | 47 | 2021 | East | | CVD death | 36 | 47 | 2021 | South | | CVD death | 23 | 48 | 2021 | Seattle | | CVD death | 28 | 48 | 2021 | North | | CVD death | 36 | 48 | 2021 | East | | CVD death | 24 | 48 | 2021 | South |
STEP 2: Create a table of population denominators corresponding to the ages and demographics in the table created in Step 1.
| age | year | region | population | |:----|:-----|:--------|:-----------| | 47 | 2021 | Seattle | 11353 | | 47 | 2021 | North | 347 | | 47 | 2021 | East | 8203 | | 47 | 2021 | South | 6432 | | 48 | 2021 | Seattle | 3543 | | 48 | 2021 | North | 3041 | | 48 | 2021 | East | 425 | | 48 | 2021 | South | 7124 |
STEP 3: Merge the denominators (population) onto the table with the numerators (counts) using age and the strata of interest.
| event_name | counts | age | year | region | population | |:-----------|:-------|:----|:-----|:--------|:-----------| | CVD death | 26 | 47 | 2021 | Seattle | 11353 | | CVD death | 17 | 47 | 2021 | North | 347 | | CVD death | 28 | 47 | 2021 | East | 8203 | | CVD death | 36 | 47 | 2021 | South | 6432 | | CVD death | 23 | 48 | 2021 | Seattle | 3543 | | CVD death | 28 | 48 | 2021 | North | 3041 | | CVD death | 36 | 48 | 2021 | East | 425 | | CVD death | 24 | 48 | 2021 | South | 7124 |
STEP 4: Use rads::age_standardize() to calculate the crude and age-adjusted rates
As a simple example, let's assess the King County death rate from falls in 2016 through 2020, by intent.
We will pull down the relevant death data using rads::get_data_death
and use rads::death_injury_matrix_count
to process the ICD-10 death codes into human readable mechanisms and intents.
Let's use rads
to get our death data and check if it seems reasonable:
deaths <- get_data_death(cols = c('chi_age', 'chi_year', 'nonseattle', 'underlying_cod_code'), year = c(2016:2020), kingco = T) names(deaths) nrow(deaths) # should be ~65,000 based on apriori knowledge
We see that the column names and the number of rows are what we expected. Now, let's identify the deaths from falls using the rads::death_injury_matrix_count
function. To learn about this function, type ?death_injury_matrix_count
in your console or walk through the death_functions wiki. For now, just trust that this is the correct syntax:
deaths <- death_injury_matrix_count(ph.data = deaths, intent = "*", # all five intents mechanism = "fall", icdcol = 'underlying_cod_code', kingco = F, # False because already subset group_by = 'chi_age') head(deaths[deaths >9]) # only display non-suppressed data
pretty_kable(head(deaths[deaths >9]))
Glancing at the output from the death_injury_matrix_count
function, we see the that we have aggregated deaths
by chi_age
and stratified by intent
. This is all we really need. Let's tidy our data set and rename the two essential columns at the same time:
deaths <- deaths[, .(intent, age = chi_age, count = deaths)] head(deaths[count >9]) # only display non-suppressed data
pretty_kable(head(deaths[count >9]))
The table above confirms shows that we created a summary table of event (fall deaths) counts by age and intent. Congratulations, you've completed step 1.
We can easily get the corresponding population data using rads::get_population()
population <- rads::get_population(years = c(2016:2020), geo_type = 'kc', group_by = c('ages')) head(population)
pretty_kable(head(population))
The essential columns of interest are pop
and age
, so let's drop the other columns.
population <- population[, .(age, pop)] head(population)
pretty_kable(head(population))
Since we are interested in strata (i.e., intent
), we will need a complete set of population data for each strata.
population <- rbindlist(lapply(X = 1:length(unique(deaths$intent)), FUN = function(X){ temp <- copy(population)[, intent := unique(deaths$intent)[X]] })) population[, .N, intent] # confirm that have 101 rows per intent
pretty_kable(population[, .N, intent])
The title says it all:
deaths <- merge(deaths, population, by = c('age', 'intent'), all.x = F, # drop if death strata do not match a population all.y = T) # keep population data if do not have deaths deaths[is.na(count), count := 0] # formally set rows with zero counts to zero head(deaths[count > 9]) # only display non-suppressed data
pretty_kable(head(deaths[count > 9]))
The documentation tells us that rads::age_standardize
needs a data set with either an age
or agecat
column. Since we have the former, we are good to go as long as we choose the correct reference population. Typing list_ref_pop
will show you all the available reference populations, however we already know that that DOH uses the 2000 US standard population with 11 age groups, so we can plug that into the function.
est <- age_standardize(ph.data = deaths, ref.popname = "2000 U.S. Std Population (11 age groups)", collapse = T, my.count = 'count', my.pop = 'pop', per = 100000, conf.level = 0.95, group_by = 'intent') head(est)
pretty_kable(head(est))
Here we see that the King County crude and adjusted fall death rates can differ substantially (e.g., Unintentional) or slightly (e.g., Undetermined).
In this example we will calculate the adjusted rate of deaths from any cause by King County Regions in 2019. Because Regions are not baked into the death data, we will have to crosswalk from HRAs to Regions.
deaths <- get_data_death(cols = c('chi_age', 'chi_year', 'apde_geo_hra2010_short', 'underlying_cod_code'), year = c(2019), kingco = T)
Now let's crosswalk from King County HRAs to Regions using rads::get_walk()
.
xwalk <- rads::get_xwalk(geo1 = 'region10', geo2 = 'hra10') deaths <- merge(deaths, xwalk, by.x = "apde_geo_hra2010_short", by.y = "hra10", all.x = T, all.y = F) deaths <- deaths[, .(age = chi_age, underlying_cod_code, Region = region10)] nrow(deaths[is.na(Region)]) # rows with missing Region information
head(deaths)
pretty_kable(head(deaths))
As you can see, there are few rows where the death data are missing a Region (due to a missing HRA). For simplicity sake, let's drop these rows.
deaths <- deaths[!is.na(Region)]
Remember that step 1 is supposed to create a summary table. So, let's collapse/aggregate/sum the data to get the count of deaths by age and region:
deaths <- deaths[, .(count = .N), .(age, Region)] head(deaths)
pretty_kable(head(deaths))
Finally, let's take a peak at the number of deaths by region:
deaths[, sum(count), Region]
pretty_kable(deaths[, sum(count), Region])
This table of deaths by region shows that the number of deaths in South King County is five times higher than those in North King County (r deaths[Region == 'South', .(total = sum(count))][['total']]
vs. r deaths[Region == 'North', .(total = sum(count))][['total']]
). Does that mean the death rates in South King County are five times higher than those in North King County? Let's get the corresponding population data and find out.
We again use rads::get_population() to the necessary population data
population <- rads::get_population(years = c(2019), geo_type = 'region', group_by = c('ages', 'geo_id')) head(population)
pretty_kable(head(population))
The key columns of interest are pop
, geo_id
(which has our regions), and age
. The other columns are all constants, so let's just keep what we need.
population <- population[, .(Region = geo_id, age, pop)] head(population)
pretty_kable(head(population))
Note that, unlike in the example above, we already have a complete population data set (ages 0 to 100) for each strata.
population[, .N, Region]
pretty_kable(population[, .N, Region])
Just do it!
deaths <- merge(deaths, population, by = c('Region', 'age'), all.x = F, all.y = T) deaths[is.na(count), count := 0] # set rows with zero counts to zero head(deaths)
pretty_kable(head(deaths))
est <- age_standardize(ph.data = deaths, ref.popname = "2000 U.S. Std Population (11 age groups)", collapse = T, my.count = 'count', my.pop = 'pop', per = 100000, conf.level = 0.95, group_by = 'Region') head(est)
pretty_kable(head(est))
Here we see that the the crude rate is lower in South King County (r est[Region=="South"]$crude.rate
) compared to North King County (r est[Region=="North"]$crude.rate
). This flips the relationship we saw with the counts (r est[Region=="South"]$count
vs r est[Region=="North"]$count
), but does not account for differences in the population structure in the two regions. In contrast, the adjusted rate is higher in South King County (r est[Region=="South"]$adj.rate
) vs. North King County(r est[Region=="North"]$adj.rate
) .
We hope this vignettes has convinced you of two things:
Calculating age-adjusted rates within the PHSKC universe is readily doable if you follow the four basic steps:
Create a summary table by age and strata of interest
Create an appropriate matching population table
Merge the population data onto the summary table
Use rads::age_standardize()
If you've walked through this vignette and more or less understood what's going on, you're in good shape! If you're still confused, please walk through it again and then reach out if you still have questions. Good luck!
-- r paste0('Updated ', format(Sys.time(), '%B %d, %Y'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.