knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-" )
pollster
is an R package for making topline and crosstab tables of simple weighted survey data. The package is designed for use with labelled data, like what you might use the haven
package to import from Stata or SPSS. It follows tidyverse programming conventions, and output tables are also in the form of a tidy data frame, or tibble.
Only simple weights are currently supported. For complex survey designs, we recommend the excellent survey
package.
The core functions are:
topline()
crosstab()
crosstab_3way()
Each of these functions also has a twin version which includes a column for the margin of error calculated to include the design effect of the weights.
moe_topline()
moe_crosstab()
moe_crosstab_3way()
There are also two special functions which calculate the design effect component of the margin of error for each survey wave independently.
moe_wave_crosstab()
moe_wave_crosstab_3way()
Other functions are included to calculate simple weighted summary statistics.
wtd_mean()
is a tidy-compliant wrapper around stats::weighted.mean()
summary_table()
returns a tible with summary statistics similar to the Stata command sum
Install it this way.
install.packages("pollster")
Or get the development version.
remotes::install_github("jdjohn215/pollster")
pollster
includes a dataset of Illinois responses to the Current Population Survey's voter registration supplement.
library(pollster) head(illinois)
Make a topline table like this. The output is a tibble.
topline(df = illinois, variable = maritalstatus, weight = weight)
Make a crosstab like this.
crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight)
If you prefer, you can also get the output in long format.
crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight, format = "long")
A three-way crosstab is just a normal crosstab with a third control variable. Often, this third variable is time.
crosstab_3way(df = illinois, x = educ6, y = maritalstatus, z = year, weight = weight)
Wide format is best for displaying table output. Long format is best for making graphs. pollster
outputs dovetail seamlessly with knitr::kable()
and ggplot2::ggplot()
. These examples show very basic html table output, but you can customize the appearance of your tables almost endlessly in either html or pdf formats using Hao Zhu's excellent kableExtra
package.
library(dplyr) crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>% knitr::kable(digits = 0)
library(ggplot2) crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>% ggplot(aes(educ6, pct, fill = sex)) + geom_bar(stat = "identity", position = "dodge")
Three-way crosstabs are ideal for plotting time series graphs and/or faceted plots.
crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight, format = "long") %>% ggplot(aes(year, pct, col = sex)) + geom_line() + facet_wrap(facets = vars(educ6))
Each pollster
function comes with a twin function which includes a margin of error column. For example:
moe_topline(df = illinois, variable = voter, weight = weight)
By default, moe_crosstab
output comes in long format, but you can also specify wide format.
moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight, format = "wide")
moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight) %>% ggplot(aes(x = pct, y = raceethnic, xmin = (pct - moe), xmax = (pct + moe), color = voter)) + geom_pointrange(position = position_dodge(width = 0.2))
summary_table()
creates a simple summary table of a weighted numeric variable.
summary_table(df = illinois, variable = age, weight = weight)
You can choose name_style = "pretty"
if you want column headings appropriate for a formatted table.
summary_table(df = illinois, variable = age, weight = weight, name_style = "pretty") %>% knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.