knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)

pollster is an R package for making topline and crosstab tables of simple weighted survey data. The package is designed for use with labelled data, like what you might use the haven package to import from Stata or SPSS. It follows tidyverse programming conventions, and output tables are also in the form of a tidy data frame, or tibble.

Only simple weights are currently supported. For complex survey designs, we recommend the excellent survey package.

The core functions are:

Each of these functions also has a twin version which includes a column for the margin of error calculated to include the design effect of the weights.

There are also two special functions which calculate the design effect component of the margin of error for each survey wave independently.

Other functions are included to calculate simple weighted summary statistics.

Installation

Install it this way.

install.packages("pollster")

Or get the development version.

remotes::install_github("jdjohn215/pollster")

Basic usage

pollster includes a dataset of Illinois responses to the Current Population Survey's voter registration supplement.

library(pollster)
head(illinois)

Make a topline table like this. The output is a tibble.

topline(df = illinois, variable = maritalstatus, weight = weight)

Make a crosstab like this.

crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight)

If you prefer, you can also get the output in long format.

crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight, format = "long")

A three-way crosstab is just a normal crosstab with a third control variable. Often, this third variable is time.

crosstab_3way(df = illinois, x = educ6, y = maritalstatus, z = year, weight = weight)

Making tables and graphs

Wide format is best for displaying table output. Long format is best for making graphs. pollster outputs dovetail seamlessly with knitr::kable() and ggplot2::ggplot(). These examples show very basic html table output, but you can customize the appearance of your tables almost endlessly in either html or pdf formats using Hao Zhu's excellent kableExtra package.

library(dplyr)
crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>%
  knitr::kable(digits = 0)
library(ggplot2)
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>%
  ggplot(aes(educ6, pct, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge")

Three-way crosstabs are ideal for plotting time series graphs and/or faceted plots.

crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight, format = "long") %>%
  ggplot(aes(year, pct, col = sex)) +
  geom_line() +
  facet_wrap(facets = vars(educ6))

Margin of error

Each pollster function comes with a twin function which includes a margin of error column. For example:

moe_topline(df = illinois, variable = voter, weight = weight)

By default, moe_crosstab output comes in long format, but you can also specify wide format.

moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight, format = "wide")
moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight) %>%
  ggplot(aes(x = pct, y = raceethnic, xmin = (pct - moe), xmax = (pct + moe), color = voter)) +
  geom_pointrange(position = position_dodge(width = 0.2))

Summary table

summary_table() creates a simple summary table of a weighted numeric variable.

summary_table(df = illinois, variable = age, weight = weight)

You can choose name_style = "pretty" if you want column headings appropriate for a formatted table.

summary_table(df = illinois, variable = age, 
              weight = weight, name_style = "pretty") %>%
  knitr::kable()


jdjohn215/pollster documentation built on May 19, 2023, 4:34 p.m.