In heathermkrause/WWC: Weighing the Wisdom of the Crowd

WWC: Weighing the Wisdom of Crowds

Authors: Heather Krause, Julia Silge
License: MIT

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  message = FALSE
)
options(dplyr.width = 180)

This R package is part of the the Veracio survey tool developed by leaders from Orb, Datassist, and Cognite Labs, and supported by a 2016 Knight Foundation News Challenge on Data. This survey tool provides a set of software tools and online services for nonprofits, local governments, social service agencies, journalists, and others to:

frame proper sampling questions,
help construct, link, and embed them into existing survey and Q&A platforms, and
visualize and then statistically adjust the collected results.

This open-source tool weights collected survey results for demographic and other factors to make them more scientifically sound. The Veracio tool is being developed and made available at no cost to anyone wishing to poll the crowd and share reliable, credible results.

Installation

You can install the development version of this package from GitHub using devtools:

library(devtools)
install_github("heathermkrause/WWC")

Examples

This package contains a simulated survey called texassurvey that contains 1000 respondents that have answered a yes/no question. This example survey is biased, meaning that the population in the survey does not match the true population in Texas. It has different proportions with respect to sex and race/ethnicity compared to the real population in Texas (2010-2014 5-year ACS population estimates). What does this survey look like?

library(WWC)
texassurvey

What result would a person using the survey find if s/he looked at the raw result of the survey, without adjusting for the demographic differences between the survey respondents and the true population in Texas?

library(dplyr)
library(ggplot2)
resultDF <- texassurvey %>% 
        group_by(response) %>% 
        summarize(n = n())
ggplot(resultDF, aes(x = response, y = n)) +
        geom_bar(stat = "identity", fill = "midnightblue")

Instead, the Veracio survey tool can be used to statistically weight each survey respondent relative to what proportion of Texas' real population he or she represents.

weighted <- weight_wwc(texassurvey, sex, raceethnicity)
weighted

Now what result on the survey question will we find?

resultDF <- weighted %>% 
        group_by(response) %>% 
        summarize(n = sum(weight))
ggplot(resultDF, aes(x = response, y = n)) +
        geom_bar(stat = "identity", fill = "midnightblue")