library(rads) library(data.table)
pretty_kable <- function(dt) { knitr::kable(dt, format = 'markdown') }
This vignette will provide some examples of ways to pull population data into R from the Azure cloud.
The population numbers are estimated by the WA Office of Financial Management (OFM) population unit. OFM produces two sets of estimates: (1) April 1 official population estimates for cities and towns and (2) Small Area Estimates (SAE) for smaller geographies. The get_population()
function pulls the SAE numbers and, when round = T
, should be the same as those in CHAT.
NOTE!! To get the most out of this vignette, we highly recommend that you actually type each and every bit of code into R. Doing so will almost definitely help you learn the syntax much faster than just reading the vignette or copying and pasting the code.
get_population
argumentsArguments are the values that we send to a function when it is called. Generally, typing args(my_function_of_interest)
will return the possible arguments including any defaults. For example,
args(get_population)
The standard arguments for get_population()
are found in the its help file (?get_population
), and summarized here for your convenience:
1) kingco
\<\< Logical vector of length 1. Identifies whether you want population estimates limited to King County. Only impacts results for geo_type in c('blk', blkgrp', 'lgd', 'scd', 'tract', 'zip'). Default == TRUE.
2) years
\<\< Numeric vector. Identifies which year(s) of data should be pulled. Default == 2022.
3) ages
\<\< Numeric vector. Identifies which age(s) should be pulled. Default == c(0:100), with 100 being the top coded value for 100:120.
4) genders
\<\< Character vector of length 1 or 2. Identifies gender(s) should be pulled. The acceptable values are 'f', 'female', 'm', and 'male'. Default == c('f', 'm').
5) races
\<\< Character vector of length 1 to 7. Identifies which race(s) or ethnicity should be pulled. The acceptable values are "aian", "asian", "black", "hispanic", "multiple", "nhpi", and "white". Default == all the possible values.
6) race_type
\<\< Character vector of length 1. Identifies whether to pull race data with Hispanic as an ethnicity ("race") or Hispanic as a race ("race_eth"). Default == c("race_eth").
7) geo_type
\<\< Character vector of length 1. Identifies the geographic level for which you want population estimates. The acceptable values are: 'blk', 'blkgrp', 'county', 'hra', 'kc', 'lgd' (WA State legislative districts), 'region', 'seattle', 'scd' (school districts), 'tract', and 'zip'. Default == "kc".
8) group_by
\<\< Character vector of length 0 to 7. Identifies how you would like the data 'grouped' (i.e., stratified). Valid options are limited to: "years", "ages", "genders", "race", "race_eth", "fips_co", and "geo_id". Default == NULL, i.e., estimates are only grouped / aggregated by geography (e.g. geo_id is always included).
9) round
\<\< Logical vector of length 1. Identifies whether or not population estimates should be returned as whole numbers. Default == FALSE.
10) mykey
\<\< a character vector with the name of the keyring::
key that provides access to the Health and Human Services Analytic Workspace (HHSAW). If you have never set your keyring before and or do not know what this is referring to, just type keyring::key_set('hhsaw', username = 'ALastname@kingcounty.gov')
into your R console (making sure to replace the username). The default is 'hhsaw'. Note that it can also take the name of a live database connection.
11) census_vintage \<\< Either 2010 or 2020. Specifies the anchor census of the desired estimates. Default is 2020
12) geo_vintage \<\< Either 2010 or 2020. Specifies the anchor census for geographies. For example, 2020 will return geographies based on 2020 blocks. Default is 2020
13) schema \<\< Unless you are a power user, don't mess with this
14) table_prefix \<\< Unless you are a power user, don't mess with this
15) return_query \<\< logical. Rather than returning results, the query/queries used to fetch the results are provided
There is no need to specify any or all of the arguments listed above. As the following example shows, the default arguments for get_population
provide the overall most recent year's estimated King County population.
get_population()[]
pretty_kable(get_population()[])
Note 1: The use of head()
below is not necessary. It is a convenience function that displays the first 6 rows of data and was used to keep the output in this vignette tidy.
Note 2: The use of []
after get_population() is used to print the output to the console. Typically, you would not print the results but would save them as an object. E.g., my.pop.est <- get_population()
.
WA
get_population(geo_type = 'wa', round = TRUE)[]
pretty_kable(get_population(geo_type = 'wa', round = TRUE)[])
King County
get_population(round = TRUE)[]
pretty_kable(get_population(round = T)[])
King County Regions
get_population(geo_type = c("region"), group_by = c("geo_id"), round = TRUE)[]
pretty_kable(get_population(geo_type = c("region"), group_by = c("geo_id"), round = TRUE)[])
King County Regions with round=FALSE
Turn off rounding to get the exact (fractional) number of people estimated.
rads::get_population(geo_type = 'region', round = FALSE)[]
pretty_kable(rads::get_population(geo_type = 'region', round = FALSE)[])
King County HRAs
head(get_population(geo_type = c("hra"), group_by = c("geo_id"))[])
pretty_kable(head(get_population(geo_type = c("hra"), group_by = c("geo_id"))[]))
King County Zip codes
head(get_population(geo_type = c("zip"), group_by = c("geo_id"))[])
pretty_kable(head(get_population(geo_type = c("zip"), group_by = c("geo_id"))[]))
King County Census Tracts
head(get_population(geo_type = c("tract"), group_by = c("geo_id"), ages = 18, census_vintage = 2020, geo_vintage = 2020)[])
pretty_kable(head(get_population(geo_type = c("tract"), group_by = c("geo_id"), ages = 18, census_vintage = 2020, geo_vintage = 2020)[]))
King County Census Block Groups
head(get_population(geo_type = c("blkgrp"), group_by = c("geo_id"), ages = 18, census_vintage = 2020, geo_vintage = 2020)[])
pretty_kable(head(get_population(geo_type = c("blkgrp"), group_by = c("geo_id"), ages = 18,census_vintage = 2020, geo_vintage = 2020)[]))
King County Census Blocks
#ages added to make things go faster head(get_population(geo_type = c("blk"), group_by = c("geo_id"), ages = 18, census_vintage = 2020, geo_vintage = 2020)[])
pretty_kable(head(get_population(geo_type = c("blk"), group_by = c("geo_id"), ages = 18, census_vintage = 2020, geo_vintage = 2020)[]))
King County multiple years combined
get_population(years = 2017:2019)[]
pretty_kable(get_population(years = 2017:2019)[])
King County multiple years stratified
get_population(years = 2017:2019, group_by = "years")[]
pretty_kable(get_population(years = 2017:2019, group_by = "years")[])
King County multiple ages combined
get_population(ages = 65:70)[]
pretty_kable(get_population(ages = 65:70)[])
King County multiple ages stratified
get_population(ages = 65:70, group_by = "ages")[]
pretty_kable(get_population(ages = 65:70, group_by = "ages")[])
King County female only
get_population(genders = "F")[]
pretty_kable(get_population(genders = "F")[])
King County gender stratified
get_population(group_by = "genders")[]
pretty_kable(get_population(group_by = "genders")[])
King County AIAN (not Hispanic)
get_population(races = "aian", race_type = "race_eth")[]
pretty_kable(get_population(races = "aian", race_type = "race_eth")[])
King County AIAN (regardless of Hispanic ethnicity)
get_population(races = "aian", race_type = "race", group_by = 'race')[]
pretty_kable(get_population(races = "aian", race_type = "race", group_by = 'race')[])
King County stratified by Hispanic as race
get_population(race_type = "race_eth", group_by = "race_eth")[]
pretty_kable(get_population(race_type = "race_eth", group_by = "race_eth")[])
King County stratified by race (Hispanic as ethnicity)
get_population(race_type = "race", group_by = "race")[]
pretty_kable(get_population(race_type = "race", group_by = "race")[])
King County regions stratified by year and gender
reg_yr_gen <- get_population(geo_type = "region", years = 2017:2019, group_by = c("geo_id", "years", "genders")) reg_yr_gen <- reg_yr_gen[, .(region = geo_id, year, gender, pop)] print(setorder(reg_yr_gen, region, year, gender)[1:12])
pretty_kable(setorder(reg_yr_gen, region, year, gender)[1:12])
King County regions stratified by year -- Female Hispanic and Asian-NH residents aged 16-25 only -- not rounded
get_population(ages = 16:25, genders = "F", years = 2017:2019, races = c("hispanic", "asian"), geo_type = "region", race_type = "race_eth", group_by = c("geo_id", "years", "race_eth"), round = F)[1:12]
pretty_kable(get_population(ages = 16:25, genders = "F", years = 2017:2019, races = c("hispanic", "asian"), geo_type = "region", race_type = "race_eth", group_by = c("geo_id", "years", "race_eth"), round = F)[1:12])
group_by
valueSometimes a user might want to access population data by Hispanic ethnicity. To get population values by race X ethnicity, users should include 'hispanic' in the group_by
argument. This option only works in conjunction when race_type = 'race_eth'
. Several combinations (e.g. adding 'hispanic' to the races
argument) will not work and will throw some (hopefully) informative errors. Other options (as demonstrated above) will continue to work.
King County regions stratified by Hispanic/Non-Hispanic
# pull in data stratified by race/eth and region reg_hisp_nonhisp <- get_population(geo_type = 'region', group_by = 'hispanic') # print select columns reg_hisp_nonhisp <- reg_hisp_nonhisp[, .(region = geo_id, hispanic, pop)] print(setorder(reg_hisp_nonhisp, region, hispanic))
pretty_kable(setorder(reg_hisp_nonhisp, region, hispanic))
Return all race x Hispanic ethnicity combinations
race_x_eth <- get_population(race_type = 'race_eth', group_by = c('race_eth', 'hispanic')) race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)] print(setorder(race_x_eth, race_eth, hispanic))
pretty_kable(setorder(race_x_eth, race_eth, hispanic))
Return population of White residents by Hispanic ethnicity
race_x_eth <- get_population(race_type = 'race_eth', races = 'white', group_by = c('race_eth', 'hispanic')) race_x_eth <- race_x_eth[, .(year, race_eth, hispanic, pop)] print(setorder(race_x_eth, race_eth, hispanic))
pretty_kable(setorder(race_x_eth, race_eth, hispanic))
Some users may not need/want to rely on get_population
's auto-connection to HHSAW via keyring. Users can instead pass an existing database connection through the mykey argument. The example below still uses keyring (since most get_population
users are on the PH domain), but it can be replaced by ActiveDirectoryIntegrated
type authentications to HHSAW for the KC lucky ducks.
# Via autoconnect r1 = get_population() mycon <- DBI::dbConnect( odbc::odbc(), driver = getOption("rads.odbc_version"), server = "kcitazrhpasqlprp16.azds.kingcounty.gov", database = "hhs_analytics_workspace", uid = keyring::key_list('hhsaw')[["username"]], pwd = keyring::key_get('hhsaw', keyring::key_list('hhsaw')[["username"]]), Encrypt = "yes", TrustServerCertificate = "yes", Authentication = "ActiveDirectoryPassword") r2 = get_population(mykey = mycon) print(all.equal(r1,r2))
-- r paste0('Updated ', format(Sys.time(), '%B %d, %Y'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.