srvyr: srvyr: A package for 'dplyr'-Like Syntax for Summary...

srvyrR Documentation

srvyr: A package for 'dplyr'-Like Syntax for Summary Statistics of Survey Data.

Description

The srvyr package provides a new way of calculating summary statistics on survey data, based on the dplyr package. There are three stages to using srvyr functions, creating a survey object, manipulating the data, and calculating survey statistics.

Functions to create a survey object

as_survey_design, as_survey_rep, and as_survey_twophase are used to create surveys based on a data.frame and design variables, replicate weights or two phase design respectively. Each is based on a function in the survey package (svydesign, svrepdesign, twophase), and it is easy to modify code that uses the survey package so that it works with the srvyr package. See vignette("srvyr_vs_survey") for more details.

The function as_survey will choose between the other three functions based on the arguments given to save some typing.

Functions to manipulate data in a survey object

Once you've created a survey object, you can manipulate the data as you would using dplyr with a data.frame. mutate modifies or creates a variable, select and rename select or rename variables, and filter keeps certain observations.

Note that arrange and two table verbs such as bind_rows, bind_cols, or any of the joins are not usable on survey objects because they might require modifications to the definition of your survey. If you need to use these functions, you should do so before you convert the data.frame to a survey object.

Functions to summarize a survey object

Now that you have your data set up correctly, you can calculate summary statistics. To get the statistic over the whole population, use summarise, or to calculate it over a set of groups, use group_by first.

You can calculate the mean, (with survey_mean), the total (survey_total), the quantile (survey_quantile), or a ratio (survey_ratio). By default, srvyr will return the statistic and the standard error around it in a data.frame, but with the vartype parameter, you can also get a confidence interval ("ci"), variance ("var"), or coefficient of variation ("cv").

Within summarise, you can also use unweighted, which calculates a function without taking into consideration the survey weighting.

Author(s)

Maintainer: Greg Freedman Ellis greg.freedman@gmail.com

Authors:

  • Ben Schneider [contributor]

Other contributors:

  • Thomas Lumley [contributor]

  • Tomasz Żółtak [contributor]

  • Pavel N. Krivitsky pavel@statnet.org [contributor]

See Also

Useful links:


srvyr documentation built on Sept. 11, 2024, 8:43 p.m.