required <- c("survey", "weights") knitr::opts_chunk$set(message=F, warning=F, comment = "") if (!all(sapply(required, requireNamespace, quietly = TRUE))) knitr::opts_chunk$set(eval = FALSE) library(jtools)
The survey
package is one of R's best tools for those working in the social sciences.
For many, it saves you from needing to use commercial software for research that uses
survey data. However, it lacks one function that many academic researchers often
need to report in publications: correlations. The svycor
function in jtools
helps to fill that gap.
A note, however, is necessary. The initial motivation to add this feature comes from a response to a question about
calculating correlations with the survey
package written by Thomas Lumley, the survey
package author. All that is good about this function should be attributed to Dr. Lumley;
all that is wrong with it should be attributed to me (Jacob).
With that said, let's look at an example. First, we need to get a survey.design
object. This one is built into the survey
package.
library(survey) data(api) dstrat <- svydesign(id = ~1,strata = ~stype, weights = ~pw, data = apistrat, fpc=~fpc)
The necessary arguments are no different than when using svyvar
. Specify, using
an equation, which variables (and from which design) to include. It doesn't matter
which side of the equation the variables are on.
svycor(~api00 + api99, design = dstrat)
You can specify with the digits =
argument how many digits past the decimal point
should be printed.
svycor(~api00 + api99, design = dstrat, digits = 4)
Any other arguments that you would normally pass to svyvar
will be used as well,
though in some cases it may not affect the output.
One thing that survey
won't do for you is give you p values for the null hypothesis
that $r = 0$. While at first blush finding the p value might seem like a simple
procedure, complex surveys will almost always violate the important distributional
assumptions that go along with simple hypothesis tests of the correlation coefficient.
There is not a clear consensus on the appropriate way to conduct hypothesis tests in
this context, due in part to the fact that most analyses of complex
surveys occurs in the context of multiple regression rather than simple bivariate cases.
If sig.stats = TRUE
, then svycor
will use the wtd.cor
function from the weights
package to conduct hypothesis tests. The p values are derived from a bootstrap
procedure in which the weights define sampling probability. The bootn =
argument
is given to wtd.cor
to define the number of simulations to run. This can significantly
increase the running time for large samples and/or large numbers of simulations. The
mean1
argument tells wtd.cor
whether it should treat your sample size as the number of
observations in the survey design (the number of rows in the data frame) or the
sum of the weights. Usually, the former is desired, so the default value of mean1
is
TRUE
.
svycor(~api00 + api99, design = dstrat, digits = 4, sig.stats = TRUE, bootn = 2000, mean1 = TRUE)
When using sig.stats = TRUE
, the correlation parameter estimates come from the
bootstrap procedure rather than the simpler method based on the survey-weighted
covariance matrix when sig.stats = FALSE
.
By saving the output of the function, you can extract non-rounded coefficients, p values, and standard errors.
c <- svycor(~api00 + api99, design = dstrat, digits = 4, sig.stats = TRUE, bootn = 2000, mean1 = TRUE) c$cors c$p.values c$std.err
The heavy lifting behind the scenes is done by svyvar
, which from its output you
may not realize also calculates covariance.
svyvar(~api00 + api99, design = dstrat)
But if you save the svyvar
object, you can see that there's more than meets the eye.
var <- svyvar(~api00 + api99, design = dstrat) var <- as.matrix(var) var
Once we know that, it's just a matter of using R's cov2cor
function and cleaning
up the output.
cor <- cov2cor(var) cor
Now to get rid of that covariance matrix...
cor <- cor[1:nrow(cor), 1:nrow(cor)] cor
svycor
has its own print method, so you won't see so many digits past the decimal
point. You can extract the un-rounded matrix, however.
out <- svycor(~api99 + api00, design = dstrat) out$cors
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.