library(rads)
library(data.table)
pretty_kable <- function(dt) { 
  knitr::kable(dt, format = 'markdown')
}

Introduction

Many PHSKC metrics are simple counts or means/proportions/fractions. The rads::calc() function was created to calculate these metrics from survey and line-level data. However, there are certain metrics and datasets (e.g., death & CHARS) where true rates -- with a population denominator -- are of interest. Most of the time these rates need to be age-adjusted and most of the time DOH and APDE adjust to the US 2000 Standard Population with 11 age groups (see rads.data::population_reference_pop_11_age_groups). The rads package has a suite of tools that were developed to facilitate these types of analyses. This vignette explains how to use these tools in four steps.

The four steps: an overview

STEP 1: Create a summary table of event counts (deaths, hospitalizations, etc.) by single year age groups and strata of interest. For example, part of your your table might look like this:

| event_name | counts | age | year | region | |:-----------|:-------|:----|:-----|:--------| | CVD death | 26 | 47 | 2021 | Seattle | | CVD death | 17 | 47 | 2021 | North | | CVD death | 28 | 47 | 2021 | East | | CVD death | 36 | 47 | 2021 | South | | CVD death | 23 | 48 | 2021 | Seattle | | CVD death | 28 | 48 | 2021 | North | | CVD death | 36 | 48 | 2021 | East | | CVD death | 24 | 48 | 2021 | South |

STEP 2: Create a table of population denominators corresponding to the ages and demographics in the table created in Step 1.

| age | year | region | population | |:----|:-----|:--------|:-----------| | 47 | 2021 | Seattle | 11353 | | 47 | 2021 | North | 347 | | 47 | 2021 | East | 8203 | | 47 | 2021 | South | 6432 | | 48 | 2021 | Seattle | 3543 | | 48 | 2021 | North | 3041 | | 48 | 2021 | East | 425 | | 48 | 2021 | South | 7124 |

STEP 3: Merge the denominators (population) onto the table with the numerators (counts) using age and the strata of interest.

| event_name | counts | age | year | region | population | |:-----------|:-------|:----|:-----|:--------|:-----------| | CVD death | 26 | 47 | 2021 | Seattle | 11353 | | CVD death | 17 | 47 | 2021 | North | 347 | | CVD death | 28 | 47 | 2021 | East | 8203 | | CVD death | 36 | 47 | 2021 | South | 6432 | | CVD death | 23 | 48 | 2021 | Seattle | 3543 | | CVD death | 28 | 48 | 2021 | North | 3041 | | CVD death | 36 | 48 | 2021 | East | 425 | | CVD death | 24 | 48 | 2021 | South | 7124 |

STEP 4: Use rads::age_standardize() to calculate the crude and age-adjusted rates

The four steps: example 1

As a simple example, let's assess the King County death rate from falls in 2016 through 2020, by intent.

STEP 1: Create a summary table of event counts

We will pull down the relevant death data using rads::get_data_death and use rads::death_injury_matrix_count to process the ICD-10 death codes into human readable mechanisms and intents.

Let's use rads to get our death data and check if it seems reasonable:

deaths <- get_data_death(cols = c('chi_age', 'chi_year', 'nonseattle', 
                                  'underlying_cod_code'), 
                               year = c(2016:2020), 
                               kingco = T)
names(deaths) 
nrow(deaths) # should be ~65,000 based on apriori knowledge

We see that the column names and the number of rows are what we expected. Now, let's identify the deaths from falls using the rads::death_injury_matrix_count function. To learn about this function, type ?death_injury_matrix_count in your console or walk through the death_functions wiki. For now, just trust that this is the correct syntax:

deaths <- death_injury_matrix_count(ph.data = deaths,
                               intent = "*", # all five intents
                               mechanism = "fall",
                               icdcol = 'underlying_cod_code',
                               kingco = F, # False because already subset
                               group_by = 'chi_age') 
head(deaths[deaths >9]) # only display non-suppressed data
pretty_kable(head(deaths[deaths >9]))

Glancing at the output from the death_injury_matrix_count function, we see the that we have aggregated deaths by chi_age and stratified by intent. This is all we really need. Let's tidy our data set and rename the two essential columns at the same time:

deaths <- deaths[, .(intent, age = chi_age, count = deaths)]
head(deaths[count >9]) # only display non-suppressed data
pretty_kable(head(deaths[count >9]))

The table above confirms shows that we created a summary table of event (fall deaths) counts by age and intent. Congratulations, you've completed step 1.

STEP 2: Create a table of population denominators

We can easily get the corresponding population data using rads::get_population()

population <- rads::get_population(years = c(2016:2020), 
                                   geo_type = 'kc', 
                                   group_by = c('ages'))
head(population)
pretty_kable(head(population))

The essential columns of interest are pop and age, so let's drop the other columns.

population <- population[, .(age, pop)]
head(population)
pretty_kable(head(population))

Since we are interested in strata (i.e., intent), we will need a complete set of population data for each strata.

population <- rbindlist(lapply(X = 1:length(unique(deaths$intent)),
                               FUN = function(X){
                                 temp <- copy(population)[, intent := unique(deaths$intent)[X]]
                                 }))
population[, .N, intent] # confirm that have 101 rows per intent
pretty_kable(population[, .N, intent])

STEP 3: Merge the denominators onto the numerators

The title says it all:

deaths <- merge(deaths, 
                population, 
                by = c('age', 'intent'), 
                all.x = F, # drop if death strata do not match a population
                all.y = T) # keep population data if do not have deaths
deaths[is.na(count), count := 0] # formally set rows with zero counts to zero
head(deaths[count > 9]) # only display non-suppressed data
pretty_kable(head(deaths[count > 9]))

STEP 4: Use rads::age_standardize()

The documentation tells us that rads::age_standardize needs a data set with either an age or agecat column. Since we have the former, we are good to go as long as we choose the correct reference population. Typing list_ref_pop will show you all the available reference populations, however we already know that that DOH uses the 2000 US standard population with 11 age groups, so we can plug that into the function.

est <- age_standardize(ph.data = deaths,
                       ref.popname = "2000 U.S. Std Population (11 age groups)",
                       collapse = T,
                       my.count = 'count',
                       my.pop = 'pop',
                       per = 100000,
                       conf.level = 0.95, 
                       group_by = 'intent')
head(est)
pretty_kable(head(est))

Here we see that the King County crude and adjusted fall death rates can differ substantially (e.g., Unintentional) or slightly (e.g., Undetermined).

The four steps: example 2

In this example we will calculate the adjusted rate of deaths from any cause by King County Regions in 2019. Because Regions are not baked into the death data, we will have to crosswalk from HRAs to Regions.

STEP 1: Create a summary table of event counts

deaths <- get_data_death(cols = c('chi_age', 
                                  'chi_year',
                                  'apde_geo_hra2010_short', 
                                  'underlying_cod_code'), 
                               year = c(2019), 
                               kingco = T)

Now let's crosswalk from King County HRAs to Regions using rads::get_walk().

xwalk <- rads::get_xwalk(geo1 = 'region10', geo2 = 'hra10')

deaths <- merge(deaths, xwalk, 
                by.x = "apde_geo_hra2010_short", by.y = "hra10", 
                all.x = T, all.y = F)

deaths <- deaths[, .(age = chi_age, underlying_cod_code, Region = region10)]

nrow(deaths[is.na(Region)]) # rows with missing Region information
head(deaths)
pretty_kable(head(deaths))

As you can see, there are few rows where the death data are missing a Region (due to a missing HRA). For simplicity sake, let's drop these rows.

deaths <- deaths[!is.na(Region)]

Remember that step 1 is supposed to create a summary table. So, let's collapse/aggregate/sum the data to get the count of deaths by age and region:

deaths <- deaths[, .(count = .N), .(age, Region)]
head(deaths)
pretty_kable(head(deaths))

Finally, let's take a peak at the number of deaths by region:

deaths[, sum(count), Region]
pretty_kable(deaths[, sum(count), Region])

This table of deaths by region shows that the number of deaths in South King County is five times higher than those in North King County (r deaths[Region == 'South', .(total = sum(count))][['total']] vs. r deaths[Region == 'North', .(total = sum(count))][['total']]). Does that mean the death rates in South King County are five times higher than those in North King County? Let's get the corresponding population data and find out.

STEP 2: Create a table of population denominators

We again use rads::get_population() to the necessary population data

population <- rads::get_population(years = c(2019), 
                                   geo_type = 'region', 
                                   group_by = c('ages', 'geo_id'))
head(population)
pretty_kable(head(population))

The key columns of interest are pop, geo_id (which has our regions), and age. The other columns are all constants, so let's just keep what we need.

population <- population[, .(Region = geo_id, age, pop)]
head(population)
pretty_kable(head(population))

Note that, unlike in the example above, we already have a complete population data set (ages 0 to 100) for each strata.

population[, .N, Region]
pretty_kable(population[, .N, Region])

STEP 3: Merge the denominators onto the numerators

Just do it!

deaths <- merge(deaths, 
                population, 
                by = c('Region', 'age'), 
                all.x = F, 
                all.y = T) 
deaths[is.na(count), count := 0] # set rows with zero counts to zero
head(deaths)
pretty_kable(head(deaths))

STEP 4: Use rads::age_standardize()

est <- age_standardize(ph.data = deaths,
                       ref.popname = "2000 U.S. Std Population (11 age groups)",
                       collapse = T,
                       my.count = 'count',
                       my.pop = 'pop',
                       per = 100000,
                       conf.level = 0.95, 
                       group_by = 'Region')
head(est)
pretty_kable(head(est))

Here we see that the the crude rate is lower in South King County (r est[Region=="South"]$crude.rate) compared to North King County (r est[Region=="North"]$crude.rate). This flips the relationship we saw with the counts (r est[Region=="South"]$count vs r est[Region=="North"]$count), but does not account for differences in the population structure in the two regions. In contrast, the adjusted rate is higher in South King County (r est[Region=="South"]$adj.rate) vs. North King County(r est[Region=="North"]$adj.rate) .

Conclusion

We hope this vignettes has convinced you of two things:

  1. Age-adjusted rates are important because they can (sometimes) tell a different story compared to crude rates and counts!
  2. Calculating age-adjusted rates within the PHSKC universe is readily doable if you follow the four basic steps:

    1. Create a summary table by age and strata of interest

    2. Create an appropriate matching population table

    3. Merge the population data onto the summary table

    4. Use rads::age_standardize()

If you've walked through this vignette and more or less understood what's going on, you're in good shape! If you're still confused, please walk through it again and then reach out if you still have questions. Good luck!

-- r paste0('Updated ', format(Sys.time(), '%B %d, %Y'))



PHSKC-APDE/rads documentation built on April 14, 2025, 10:47 a.m.