In PHSKC-APDE/rads: Assisted computation of King County public health data

library(rads)
library(data.table)

pretty_kable <- function(dt) { 
  knitr::kable(dt, format = 'markdown')
}

Introduction

This vignette will provide some examples of ways to pull population data into R from the Azure cloud.

The population numbers are estimated by the WA Office of Financial Management (OFM) population unit. OFM produces two sets of estimates: (1) April 1 official population estimates for cities and towns and (2) Small Area Estimates (SAE) for smaller geographies. The get_population() function pulls the SAE numbers and, when round = T, should be the same as those in CHAT.

NOTE!! To get the most out of this vignette, we highly recommend that you actually type each and every bit of code into R. Doing so will almost definitely help you learn the syntax much faster than just reading the vignette or copying and pasting the code.

`get_population` arguments

Arguments are the values that we send to a function when it is called. Generally, typing args(my_function_of_interest) will return the possible arguments including any defaults. For example,

args(get_population)

The standard arguments for get_population() are found in the its help file (?get_population), and summarized here for your convenience:

1) kingco \<\< Logical vector of length 1. Identifies whether you want population estimates limited to King County. Only impacts results for geo_type in c('blk', blkgrp', 'lgd', 'scd', 'tract', 'zip'). Default == TRUE.

2) years \<\< Numeric vector. Identifies which year(s) of data should be pulled. Default == 2022.

3) ages \<\< Numeric vector. Identifies which age(s) should be pulled. Default == c(0:100), with 100 being the top coded value for 100:120.

4) genders \<\< Character vector of length 1 or 2. Identifies gender(s) should be pulled. The acceptable values are 'f', 'female', 'm', and 'male'. Default == c('f', 'm').

5) races \<\< Character vector of length 1 to 7. Identifies which race(s) or ethnicity should be pulled. The acceptable values are "aian", "asian", "black", "hispanic", "multiple", "nhpi", and "white". Default == all the possible values.

6) race_type \<\< Character vector of length 1. Identifies whether to pull race data with Hispanic as an ethnicity ("race") or Hispanic as a race ("race_eth"). Default == c("race_eth").

7) geo_type \<\< Character vector of length 1. Identifies the geographic level for which you want population estimates. The acceptable values are: 'blk', 'blkgrp', 'county', 'hra', 'kc', 'lgd' (WA State legislative districts), 'region', 'seattle', 'scd' (school districts), 'tract', and 'zip'. Default == "kc".

8) group_by \<\< Character vector of length 0 to 7. Identifies how you would like the data 'grouped' (i.e., stratified). Valid options are limited to: "years", "ages", "genders", "race", "race_eth", "fips_co", and "geo_id". Default == NULL, i.e., estimates are only grouped / aggregated by geography (e.g. geo_id is always included).

9) round \<\< Logical vector of length 1. Identifies whether or not population estimates should be returned as whole numbers. Default == FALSE.

10) mykey \<\< a character vector with the name of the keyring:: key that provides access to the Health and Human Services Analytic Workspace (HHSAW). If you have never set your keyring before and or do not know what this is referring to, just type keyring::key_set('hhsaw', username = 'ALastname@kingcounty.gov') into your R console (making sure to replace the username). The default is 'hhsaw'. Note that it can also take the name of a live database connection.

11) census_vintage \<\< Either 2010 or 2020. Specifies the anchor census of the desired estimates. Default is 2020

12) geo_vintage \<\< Either 2010 or 2020. Specifies the anchor census for geographies. For example, 2020 will return geographies based on 2020 blocks. Default is 2020

13) schema \<\< Unless you are a power user, don't mess with this

14) table_prefix \<\< Unless you are a power user, don't mess with this

15) return_query \<\< logical. Rather than returning results, the query/queries used to fetch the results are provided

There is no need to specify any or all of the arguments listed above. As the following example shows, the default arguments for get_population provide the overall most recent year's estimated King County population.

get_population()[]

pretty_kable(get_population()[])

Example analyses

Note 1: The use of head() below is not necessary. It is a convenience function that displays the first 6 rows of data and was used to keep the output in this vignette tidy.

Note 2: The use of [] after get_population() is used to print the output to the console. Typically, you would not print the results but would save them as an object. E.g., my.pop.est <- get_population().

Geographic estimates

get_population(geo_type = 'wa', 
               round = TRUE)[]

pretty_kable(get_population(geo_type = 'wa', round = TRUE)[])

King County

get_population(round = TRUE)[]

pretty_kable(get_population(round = T)[])

King County Regions

get_population(geo_type = c("region"),
               group_by = c("geo_id"),
               round = TRUE)[]

pretty_kable(get_population(geo_type = c("region"),
               group_by = c("geo_id"),
               round = TRUE)[])

King County Regions with round=FALSE

Turn off rounding to get the exact (fractional) number of people estimated.

rads::get_population(geo_type = 'region', 
                     round = FALSE)[]

pretty_kable(rads::get_population(geo_type = 'region', 
                     round = FALSE)[])

King County HRAs

head(get_population(geo_type = c("hra"), 
                    group_by = c("geo_id"))[])

pretty_kable(head(get_population(geo_type = c("hra"), group_by = c("geo_id"))[]))

King County Zip codes

head(get_population(geo_type = c("zip"), 
                    group_by = c("geo_id"))[])

pretty_kable(head(get_population(geo_type = c("zip"), group_by = c("geo_id"))[]))

King County Census Tracts

head(get_population(geo_type = c("tract"), 
                    group_by = c("geo_id"), 
                    ages = 18, 
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])

pretty_kable(head(get_population(geo_type = c("tract"), group_by = c("geo_id"), 
                                 ages = 18, census_vintage = 2020, 
                                 geo_vintage = 2020)[]))

King County Census Block Groups

head(get_population(geo_type = c("blkgrp"), 
                    group_by = c("geo_id"), 
                    ages = 18,
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])

pretty_kable(head(get_population(geo_type = c("blkgrp"), group_by = c("geo_id"), 
                                 ages = 18,census_vintage = 2020, 
                                 geo_vintage = 2020)[]))

King County Census Blocks

#ages added to make things go faster
head(get_population(geo_type = c("blk"), 
                    group_by = c("geo_id"), 
                    ages = 18, 
                    census_vintage = 2020, 
                    geo_vintage = 2020)[])

pretty_kable(head(get_population(geo_type = c("blk"), group_by = c("geo_id"), 
                                 ages = 18, census_vintage = 2020, 
                                 geo_vintage = 2020)[]))

Other simple arguments

King County multiple years combined

get_population(years = 2017:2019)[]

pretty_kable(get_population(years = 2017:2019)[])

King County multiple years stratified

get_population(years = 2017:2019, 
               group_by = "years")[]

pretty_kable(get_population(years = 2017:2019, 
               group_by = "years")[])

King County multiple ages combined

get_population(ages = 65:70)[]

pretty_kable(get_population(ages = 65:70)[])

King County multiple ages stratified

get_population(ages = 65:70, group_by = "ages")[]

pretty_kable(get_population(ages = 65:70, 
                            group_by = "ages")[])

King County female only

get_population(genders = "F")[]

pretty_kable(get_population(genders = "F")[])

King County gender stratified

get_population(group_by = "genders")[]

pretty_kable(get_population(group_by = "genders")[])

King County AIAN (not Hispanic)

get_population(races = "aian", 
               race_type = "race_eth")[]

pretty_kable(get_population(races = "aian", race_type = "race_eth")[])

King County AIAN (regardless of Hispanic ethnicity)

get_population(races = "aian", 
               race_type = "race", 
               group_by = 'race')[]

pretty_kable(get_population(races = "aian", race_type = "race", group_by = 'race')[])

King County stratified by Hispanic as race

get_population(race_type = "race_eth", 
               group_by = "race_eth")[]

pretty_kable(get_population(race_type = "race_eth", group_by = "race_eth")[])

King County stratified by race (Hispanic as ethnicity)

get_population(race_type = "race", 
               group_by = "race")[]

pretty_kable(get_population(race_type = "race", group_by = "race")[])

Complex arguments

King County regions stratified by year and gender

reg_yr_gen <- get_population(geo_type = "region",
                            years = 2017:2019, 
                            group_by = c("geo_id", "years", "genders"))
reg_yr_gen <- reg_yr_gen[, .(region = geo_id, year, gender, pop)]
print(setorder(reg_yr_gen, region, year, gender)[1:12])

pretty_kable(setorder(reg_yr_gen, region, year, gender)[1:12])

King County regions stratified by year -- Female Hispanic and Asian-NH residents aged 16-25 only -- not rounded

get_population(ages = 16:25, 
               genders = "F", 
               years = 2017:2019, 
               races = c("hispanic", "asian"), 
               geo_type = "region", 
               race_type = "race_eth", 
               group_by = c("geo_id", "years", "race_eth"), 
               round = F)[1:12]

pretty_kable(get_population(ages = 16:25, 
               genders = "F", 
               years = 2017:2019, 
               races = c("hispanic", "asian"), 
               geo_type = "region", 
               race_type = "race_eth", 
               group_by = c("geo_id", "years", "race_eth"), 
               round = F)[1:12])

'hispanic' as a `group_by` value

Sometimes a user might want to access population data by Hispanic ethnicity. To get population values by race X ethnicity, users should include 'hispanic' in the group_by argument. This option only works in conjunction when race_type = 'race_eth'. Several combinations (e.g. adding 'hispanic' to the races argument) will not work and will throw some (hopefully) informative errors. Other options (as demonstrated above) will continue to work.

King County regions stratified by Hispanic/Non-Hispanic

# pull in data stratified by race/eth and region
  reg_hisp_nonhisp <- get_population(geo_type = 'region', 
                                     group_by = 'hispanic')

  # print select columns 
  reg_hisp_nonhisp <- reg_hisp_nonhisp[, .(region = geo_id, hispanic, pop)]
  print(setorder(reg_hisp_nonhisp, region, hispanic))

pretty_kable(setorder(reg_hisp_nonhisp, region, hispanic))

Return all race x Hispanic ethnicity combinations

race_x_eth <- get_population(race_type = 'race_eth', 
                             group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))

pretty_kable(setorder(race_x_eth, race_eth, hispanic))

Return population of White residents by Hispanic ethnicity

race_x_eth <- get_population(race_type = 'race_eth', 
                             races = 'white', 
                             group_by = c('race_eth', 'hispanic'))
race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)]
print(setorder(race_x_eth, race_eth, hispanic))

pretty_kable(setorder(race_x_eth, race_eth, hispanic))

Using get_population with a pre-existing connection to hhsaw

Some users may not need/want to rely on get_population's auto-connection to HHSAW via keyring. Users can instead pass an existing database connection through the mykey argument. The example below still uses keyring (since most get_population users are on the PH domain), but it can be replaced by ActiveDirectoryIntegrated type authentications to HHSAW for the KC lucky ducks.

# Via autoconnect
r1 = get_population()

mycon <- DBI::dbConnect(
  odbc::odbc(), 
  driver = getOption("rads.odbc_version"), 
  server = "kcitazrhpasqlprp16.azds.kingcounty.gov", 
  database = "hhs_analytics_workspace",
  uid = keyring::key_list('hhsaw')[["username"]], 
  pwd = keyring::key_get('hhsaw', keyring::key_list('hhsaw')[["username"]]), 
  Encrypt = "yes", 
  TrustServerCertificate = "yes", 
  Authentication = "ActiveDirectoryPassword")

r2 = get_population(mykey = mycon)

print(all.equal(r1,r2))