WWC: Weighing the Wisdom of Crowds

Authors: Heather Krause, Julia Silge
License: MIT

Travis-CI Build Status CRAN_Status_Badge Codecov

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  message = FALSE
)
options(dplyr.width = 180)

This R package is part of the the Veracio survey tool developed by leaders from Orb, Datassist, and Cognite Labs, and supported by a 2016 Knight Foundation News Challenge on Data. This survey tool provides a set of software tools and online services for nonprofits, local governments, social service agencies, journalists, and others to:

This open-source tool weights collected survey results for demographic and other factors to make them more scientifically sound. The Veracio tool is being developed and made available at no cost to anyone wishing to poll the crowd and share reliable, credible results.

Installation

You can install the development version of this package from GitHub using devtools:

library(devtools)
install_github("heathermkrause/WWC")

Examples

This package contains a simulated survey called texassurvey that contains 1000 respondents that have answered a yes/no question. This example survey is biased, meaning that the population in the survey does not match the true population in Texas. It has different proportions with respect to sex and race/ethnicity compared to the real population in Texas (2010-2014 5-year ACS population estimates). What does this survey look like?

library(WWC)
texassurvey

What result would a person using the survey find if s/he looked at the raw result of the survey, without adjusting for the demographic differences between the survey respondents and the true population in Texas?

library(dplyr)
library(ggplot2)
resultDF <- texassurvey %>% 
        group_by(response) %>% 
        summarize(n = n())
ggplot(resultDF, aes(x = response, y = n)) +
        geom_bar(stat = "identity", fill = "midnightblue")

Instead, the Veracio survey tool can be used to statistically weight each survey respondent relative to what proportion of Texas' real population he or she represents.

weighted <- weight_wwc(texassurvey, sex, raceethnicity)
weighted

Now what result on the survey question will we find?

resultDF <- weighted %>% 
        group_by(response) %>% 
        summarize(n = sum(weight))
ggplot(resultDF, aes(x = response, y = n)) +
        geom_bar(stat = "identity", fill = "midnightblue")

To learn more about the survey weighting algorithm, see the main vignette.

Code of Conduct

This project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.



heathermkrause/WWC documentation built on May 17, 2019, 3:20 p.m.