knitr::opts_chunk$set(echo = TRUE, fig.align = "center") library(dplyr) library(ggplot2) library(PUMSutils)
This is a short introduction to plotting results for the ACS PUMS data with ggplot2 and PUMSutils. The main issue in plotting results drawn from the ACS PUMS data is that it is weighted sample. Each row represents a different number of cases in the population, indicated by the expansion weight (WGTP or PWGTP), and any correct calculation of a statistic has to use the expansion weights. There are basically 3 ways to proceed:
weight
aesthetic that can be set to WGTP or PWGTP.weight
aesthetic appropriately.geom_col
.We discuss each of these in turn. The data used in all plots is 1-year 2016 ACS data for Washington State.
The first approach is to use ggplot, and set the weight
aesthetic directly like this:
wa.house16$Tenure <- own.rent(wa.house16) ggplot(wa.house16, aes(Tenure)) + geom_bar(aes(weight=WGTP))
Only some plot types accept a weight aesthetic. It can only be used for counts. And the feature is a little under-documented. For more details, see the ggplot docs.
PUMSutils has convenience functions for plotting that cover some common cases. The functions take the most common labels as optional arguments. The acs.barchart function can produce barcharts of weighted count data. The plot above would be produced like this:
acs.barchart(wa.house16, 'Tenure', title='HH count by Tenure')
If you want to partition data on 2 variables, you can get a stacked barchart with acs.barchart by setting the stack
argument. Below we chart employment status by gender.
wa.pop16$sex <- acs.recode(wa.pop16, 'SEX', data.dict16) wa.pop16$Employment <- acs.recode(wa.pop16, 'ESR', data.dict16) acs.barchart(wa.pop16, 'sex', stack='Employment', title = 'Employment Status by Gender', xlab='Gender')
PUMSutils also includes acs.boxplot for producing weighted boxplots of the distribution of a numeric variable within categories. Here we plot the distribution of household income as a function of the number of workers in the household.
# ESR is employment status; 1,2,4,5 are employed # match.count counts the number of people per household matching a condition wa.house16$num.work <- match.count(wa.house16, wa.pop16, ESR %in% c(1,2,4,5)) wa.house16$num.work <- clip.column(wa.house16, 'num.work', 4) wa.house16$inc.clip <- pmin(wa.house16$HINCP, 300000) acs.boxplot(wa.house16[!is.na(wa.house16$HINCP),], 'num.work', 'inc.clip', title='HH Income vs Number of Workers', xlab='Number of Workers (max 4)', ylab='HH Income($) (max $300K)')
Income variables have a lot of extreme outliers, so we clip income at 300,000.
If PUMSutils is used to produce weighted statistics, then those can be plotted using geom_col or similar from ggplot. Here we make a barchart of the median income of households as a function of the number of workers in the household.
med.inc <- group.median(wa.house16, 'num.work', 'HINCP', result.name = 'Median Income') g <- ggplot(med.inc) + geom_col(aes(num.work, Median.Income)) g + ggtitle('Median HH Income vs. Number of Workers')
This is the only method that can plot weighted statistics other than counts.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.