Authors: Heather Krause, Julia Silge
License: MIT
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-", message = FALSE ) options(dplyr.width = 180)
This R package is part of the the Veracio survey tool developed by leaders from Orb, Datassist, and Cognite Labs, and supported by a 2016 Knight Foundation News Challenge on Data. This survey tool provides a set of software tools and online services for nonprofits, local governments, social service agencies, journalists, and others to:
This open-source tool weights collected survey results for demographic and other factors to make them more scientifically sound. The Veracio tool is being developed and made available at no cost to anyone wishing to poll the crowd and share reliable, credible results.
You can install the development version of this package from GitHub using devtools:
library(devtools) install_github("heathermkrause/WWC")
This package contains a simulated survey called texassurvey
that contains 1000 respondents that have answered a yes/no question. This example survey is biased, meaning that the population in the survey does not match the true population in Texas. It has different proportions with respect to sex and race/ethnicity compared to the real population in Texas (2010-2014 5-year ACS population estimates). What does this survey look like?
library(WWC)
texassurvey
What result would a person using the survey find if s/he looked at the raw result of the survey, without adjusting for the demographic differences between the survey respondents and the true population in Texas?
library(dplyr) library(ggplot2) resultDF <- texassurvey %>% group_by(response) %>% summarize(n = n()) ggplot(resultDF, aes(x = response, y = n)) + geom_bar(stat = "identity", fill = "midnightblue")
Instead, the Veracio survey tool can be used to statistically weight each survey respondent relative to what proportion of Texas' real population he or she represents.
weighted <- weight_wwc(texassurvey, sex, raceethnicity) weighted
Now what result on the survey question will we find?
resultDF <- weighted %>% group_by(response) %>% summarize(n = sum(weight)) ggplot(resultDF, aes(x = response, y = n)) + geom_bar(stat = "identity", fill = "midnightblue")
To learn more about the survey weighting algorithm, see the main vignette.
This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.