knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ggplot2) library(dplyr) library(sastats)
Package sastats includes a few convenience functions for estimating survey sampling errors. These are simple calculations, and the functions are just thin wrappers for a bit of code. Relevant measures:
error_se_mean()
error_se_prop()
error_me()
Package survey provides a much more comprehensive approach to survey-based calculations (errors, weighting, etc.). I've tended toward the light-weight approach outlined here, but it could be worth looking into if we are doing alot of survey analysis in R.
Caculating errors for multi-estimate metrics requires error propagation (e.g., total days which depends on participation rate & average days) . I haven't implemented these computations here, but existing packages address this need. Package propagate is one I've used, but have found tricky to implement. Package errors is newer and appears more straightforward (although I haven't tested it).
For demonstration, package sastats includes a survey dataset with annual participation metrics for 9 outdoor recreation activities:
library(dplyr) library(sastats) data(svy) activity <- left_join(svy$act, select(svy$person, Vrid, weight), by = "Vrid") glimpse(activity)
Looking at days of participation:
days <- activity %>% group_by(act) %>% summarise( avgdays = weighted.mean(days, weight, na.rm = TRUE), se = error_se_mean(days, na.rm = TRUE) ) days
Looking at participation rate:
rate <- activity %>% group_by(act, part) %>% summarise(n = n(), wtn = sum(weight)) %>% mutate( n = sum(n), rate = wtn / sum(wtn), se = error_se_prop(rate, n) ) %>% filter(part == "Checked") rate
These are useful for reporting confidence intervals. Note that the dataset weighting produces a "design effect" which inflates the margin of error. I know the design effect for this dataset, based on the summary output produced by the sastats::rake_weight()
procedure.
deff <- 1.19 rate <- mutate(rate, me = error_me(se) * deff, lower = rate - me, upper = rate + me) days <- mutate(days, me = error_me(se) * deff, lower = avgdays - me, upper = avgdays + me) library(ggplot2) ggplot(rate, aes(act, rate)) + geom_point() + geom_errorbar(aes(ymin = lower, ymax = upper)) + ggtitle("Participation Rate Estimates (with 95% CIs)") ggplot(days, aes(act, avgdays)) + geom_point() + geom_errorbar(aes(ymin = lower, ymax = upper)) + ggtitle("Average Days (per participant) Estimates (with 95% CIs)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.