knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(magrittr) library(knitr) library(psrccensus)
This vignette explains how to calculate reliability measures for ACS estimates.
The American Community Survey (ACS), the household travel survey, and other population surveys are all samples of the population with usually less 10% of the population answering the survey.
Because these surveys are a sample, they have several sources of potential error such as sampling error, nonresponse error, coverage error, measurement error, and processing error.Here is a document from Census about these concepts as they apply to ACS.
The data team wanted to develop a set of rough guidelines to help team analysts to better understanding when specifically could mean that the sample has quality issues and needs to be examined with additional scrutiny.
A coefficient of variation (CV) measures the relative amount of sampling error that is associated with a sample estimate. The CV is calculated as the ratio of the SE for an estimate to the estimate itself and is usually expressed as a percent.
The internal recommendation for understanding CVs are:
CV <= 15% is good
CV >15% and <=30% is fair
CV >30% and <=50% estimate should be used with caution
CV >50% examine with great caution whether it is usable
Note that since both the ACS and household travel survey are reported using a 90 percent confidence interval, where the Margin of Error (MOE) is reported in place of standard error, you can convert it to standard error by dividing by 1.645.
These breakpoints are based on guidance from the Census Bureau and the [National Center for Transit Research] (https://www.nctr.usf.edu/wp-content/uploads/2010/04/77802.pdf). It is important that analysts consider the nuances of their datasets to understand potential sources of error in their estimates. These internal categories are part of an ongoing conversation and may change.
The psrccensus library contains functionality has been added to provide coefficients of variation, standard errors, and reliability labels to ACS data retrieved directly from the api. Every ACS data table output from psrccensus via the function get_acs_recs() includes fields for standard error, coefficient of variation, and reliability. But often you need to transform the variables that come from the ACS to get at what you need to summarize, and this is not directly built into psrccensus.
To calculate reliability using psrccensus, on user transformed tables, you may want to apply the function reliability_calcs(). This function requires you have a dataframe, a column holding the margins of error, and a column holding the estimate. So you need to have pre-calculated the margin of error transformed, along the way to be able to retrieve the reliability of the estimate.
At each stage of calculation and aggregation, you need to re-calculate the margins of error. The R library tidycensus has built in functions for re-calculating margins of error after data transformation. We recommend that users rely on these functions to derive margins of error when data needs to be aggregated. Another option would be to code the margin of error calculations into a user defined functions.
For more background on the types of calculations performed by tidycensus for margins of error, you can also read this document from Census
Common functions from tidycensus for calculating margins of error, are the following:
moe_product:
Calculate the margin of error for a derived product
moe_prop(): Calculate the margin of error for a derived proportion
moe_ratio() Calculate the margin of error for a derived ratio
moe_sum() Calculate the margin of error for a derived sum (or difference)
In this example, we want to retrieve the percent of households that are cost-burdened by Census tract. Cost-burdened is defined in this case as spending more than 30% of household income on housing costs. This data will be used to create a map by Census Tract to show areas of greater housing cost burden geographically.
To do the summary, first you need to aggregate the data provided by households spending over 30% based on the available fields-Less than 10.0 percent,10.0 to 14.9 percent, 15.0 to 19.9 percent, 20.0 to 24.9 percent, 25.0 to 29.9 percent. These summed items by group provide the numerator of your calculation.
Next you need to redefine the total to remove when the values are not compute- this is denominator. Then you need to calculate the shares.
For each data transformation, you need to recalculate the margin of error.
library(psrccensus) library(tidycensus) library(dplyr) library(tidyr)
# Finding the corresponding ACS table - this works if you know the correct concept label. If not, another option would be to visit https://data.census.gov/table and search for the right subject table and skip to the next step ---- #x <- tidycensus::load_variables(2021,"acs5") %>% # dplyr::filter(grepl("percent", # concept, ignore.case=TRUE) & # (geography=="tract")) # getting data by tract acs_data <- get_acs_recs(geography ='tract', table.names = 'B25070', #subject table code years = c(2021), acs.type = 'acs5')
# data wrangling for grouping >30% and under 30% - census tract and population acs_data <- acs_data %>% rename_at('label', ~'rent_burden') # Create variables acs_data <- acs_data %>% mutate(rent_burden_group=factor( case_when(grepl("Less than 10.0 percent|10.0 to 14.9 percent|15.0 to 19.9 percent| 20.0 to 24.9 percent|25.0 to 29.9 percent", rent_burden) ~ "Less than 30 percent", grepl("30.0 to 34.9 percent|35.0 to 39.9 percent|35.0 to 39.9 percent| 40.0 to 49.9 percent|50.0 percent or more", rent_burden) ~ "Cost Burdened (More than 30 percent)", grepl("Not computed", rent_burden) ~ "Not computed", !is.na(rent_burden) ~ "Total")))
The moe_sum uses tidycensus.
# filter only fields of interest - census tract and estimate # use the moe_sum function from tidycensus acs_data_group <- acs_data %>% dplyr::group_by(GEOID, rent_burden_group)%>% dplyr::summarise(estimate=sum(estimate), moe=tidycensus::moe_sum(estimate=estimate, moe=moe))
# pivot data to calculate percentage by census tract, calculate estimates acs_data_group<- acs_data_group%>% pivot_wider(names_from = rent_burden_group, values_from = c(estimate,moe))%>% rename('estimate_cost_burdened'='estimate_Cost Burdened (More than 30 percent)', 'moe_cost_burdened'= 'moe_Cost Burdened (More than 30 percent)') # weight averages for population & calculating percentage/share of population per tract instead of integer acs_data_group<- acs_data_group%>% mutate(estimate_new_total=(estimate_Total-`estimate_Not computed`))%>% mutate(estimate_perc_cost_burdened=((estimate_cost_burdened/estimate_new_total)))
Because you are applying the function across the columns, as opposed to down the grouped rows, you need to apply the function rowwise. You would not need to do this if the data were not pivoted.
Note the moe_sum is applied although you are taking a difference. Then moe_prop is applied when you are finding the shares.
#calculate moes acs_data_group<- acs_data_group%>% rowwise()%>% mutate(moe_new_total=moe_sum(estimate=c(estimate_Total, `estimate_Not computed`), moe=c(moe_Total,`moe_Not computed`)))%>% mutate(moe_perc_cost_burdened=moe_prop(num=estimate_cost_burdened, denom=estimate_new_total, moe_num=moe_cost_burdened, moe_denom=moe_new_total)) acs_data_final<- acs_data_group %>% dplyr::select(GEOID, estimate_cost_burdened, estimate_new_total, estimate_perc_cost_burdened, moe_cost_burdened, moe_new_total, moe_perc_cost_burdened) %>% dplyr::filter(!is.na(estimate_cost_burdened)) acs_data_final<-reliability_calcs(acs_data_final, estimate='estimate_perc_cost_burdened', moe='moe_perc_cost_burdened') head(acs_data_final%>%select(GEOID, estimate_perc_cost_burdened, reliability))
Now the data is ready to be joined to the tract map, if desired.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.