View source: R/survey_statistics.r
survey_var | R Documentation |
Calculate population variance from complex survey data. A wrapper
around svyvar
. survey_var
should always be
called from summarise
.
survey_var(
x,
na.rm = FALSE,
vartype = c("se", "ci", "var"),
level = 0.95,
df = NULL,
...
)
survey_sd(x, na.rm = FALSE, ...)
x |
A variable or expression, or empty |
na.rm |
A logical value to indicate whether missing values should be dropped |
vartype |
Report variability as one or more of: standard error ("se", default) or variance ("var") (confidence intervals and coefficient of variation not available). |
level |
(For vartype = "ci" only) A single number or vector of numbers indicating the confidence level. |
df |
(For vartype = "ci" only) A numeric value indicating the degrees of freedom for t-distribution. The default (Inf) is equivalent to using normal distribution and in case of population variance statistics there is little reason to use any other values (see Details). |
... |
Ignored |
Be aware that confidence intervals for population variance statistic are computed by package survey using t or normal (with df=Inf) distribution (i.e. symmetric distributions). This could be a very poor approximation if even one of these conditions is met:
there are few sampling design degrees of freedom,
analyzed variable isn't normally distributed,
there is huge variation in sampling probabilities of the survey design.
Because of this be very careful using confidence intervals for population variance statistics especially while performing analysis within subsets of data or using grouped survey objects.
Sampling distribution of the variance statistic in general is asymmetric (chi-squared in case of simple random sampling of normally distributed variable) and if analyzed variable isn't normally distributed or there is huge variation in sampling probabilities of the survey design (or both) it could converge to normality only very slowly (with growing number of survey design degrees of freedom).
library(survey)
data(api)
dstrata <- apistrat %>%
as_survey_design(strata = stype, weights = pw)
dstrata %>%
summarise(api99_var = survey_var(api99),
api99_sd = survey_sd(api99))
dstrata %>%
group_by(awards) %>%
summarise(api00_var = survey_var(api00),
api00_sd = survey_sd(api00))
# standard deviation and variance of the population variance estimator
# are available with vartype argument
# (but not for the population standard deviation estimator)
dstrata %>%
summarise(api99_variance = survey_var(api99, vartype = c("se", "var")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.