Description Usage Arguments Details Value Difference with varest from package sampling Author(s) References See Also Examples
View source: R/variance_function.R
varDT
estimates the variance of the estimator of a total
in the case of a balanced sampling design with equal or unequal probabilities
using DevilleTillé (2005) formula. Without balancing variables, it falls back
to Deville's (1993) classical approximation. Without balancing variables and
with equal probabilities, it falls back to the classical HorvitzThompson
variance estimator for the total in the case of simple random sampling.
Stratification is natively supported.
var_srs
is a convenience wrapper for the (stratified) simple random
sampling case.
1 2 3 4 5 6 7 8 9 10 11 
y 
A (sparse) numerical matrix of the variable(s) whose variance of their total is to be estimated. 
pik 
A numerical vector of firstorder inclusion probabilities. 
x 
An optional (sparse) numerical matrix of balancing variable(s). 
strata 
An optional categorical vector (factor or character) when variance estimation is to be conducted within strata. 
w 
An optional numerical vector of row weights (see Details). 
precalc 
A list of precalculated results (see Details). 
id 
A vector of identifiers of the units used in the calculation.
Useful when 
varDT
aims at being the workhorse of most variance estimation conducted
with the gustave
package. It may be used to estimate the variance
of the estimator of a total in the case of (stratified) simple random sampling,
(stratified) unequal probability sampling and (stratified) balanced sampling.
The native integration of stratification based on Matrix::TsparseMatrix allows
for significant performance gains compared to higher level vectorizations
(*apply
especially).
Several timeconsuming operations (e.g. collinearitycheck, matrix
inversion) can be precalculated in order to speed up the estimation at
execution time. This is determined by the value of the parameters y
and precalc
:
if y
not NULL
and
precalc
NULL
: onthefly calculation (no precalculation).
if y
NULL
and precalc
NULL
:
precalculation whose results are stored in a list of precalculated data.
if y
not NULL
and precalc
not NULL
:
calculation using the list of precalculated data.
w
is a row weight used at the final summation step. It is useful
when varDT
or var_srs
are used on the second stage of a
twostage sampling design applying the Rao (1975) formula.
if y
is not NULL
(calculation step) :
the estimated variances as a numerical vector of size the number of
columns of y.
if y
is NULL
(precalculation step) : a list
containing precalculated data.
varest
from package sampling
varDT
differs from sampling::varest
in several ways:
The formula implemented in varDT
is more general and
encompasses balanced sampling.
Even in its reduced
form (without balancing variables), the formula implemented in varDT
slightly differs from the one implemented in sampling::varest
.
Caron (1998, pp. 178179) compares the two estimators
(sampling::varest
implements V_2, varDT
implements V_1).
varDT
introduces several optimizations:
matrixwise operations allow to estimate variance on several interest variables at once
Matrix::TsparseMatrix capability and the native integration of stratification yield significant performance gains.
the ability to precalculate some timeconsuming operations speeds up the estimation at execution time.
varDT
does not natively
implements the calibration estimator (i.e. the sampling variance estimator
that takes into account the effect of calibration). In the context of the
gustave
package, res_cal
should be called before
varDT
in order to achieve the same result.
Martin Chevalier
Caron N. (1998), "Le logiciel Poulpe : aspects méthodologiques", Actes des Journées de méthodologie statistique http://jmsinsee.fr/jms1998s03_1/ Deville, J.C. (1993), Estimation de la variance pour les enquêtes en deux phases, Manuscript, INSEE, Paris.
Deville, J.C., Tillé, Y. (2005), "Variance approximation under balanced sampling", Journal of Statistical Planning and Inference, 128, issue 2 569591
Rao, J.N.K (1975), "Unbiased variance estimation for multistage designs", Sankhya, C n°37
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  library(sampling)
set.seed(1)
# Simple random sampling case
N < 1000
n < 100
y < rnorm(N)[as.logical(srswor(n, N))]
pik < rep(n/N, n)
varDT(y, pik)
sampling::varest(y, pik = pik)
N^2 * (1  n/N) * var(y) / n
# Unequal probability sampling case
N < 1000
n < 100
pik < runif(N)
s < as.logical(UPsystematic(pik))
y < rnorm(N)[s]
pik < pik[s]
varDT(y, pik)
varest(y, pik = pik)
# The small difference is expected (see Details).
# Balanced sampling case
N < 1000
n < 100
pik < runif(N)
x < matrix(rnorm(N*3), ncol = 3)
s < as.logical(samplecube(x, pik))
y < rnorm(N)[s]
pik < pik[s]
x < x[s, ]
varDT(y, pik, x)
# Balanced sampling case (variable of interest
# among the balancing variables)
N < 1000
n < 100
pik < runif(N)
y < rnorm(N)
x < cbind(matrix(rnorm(N*3), ncol = 3), y)
s < as.logical(samplecube(x, pik))
y < y[s]
pik < pik[s]
x < x[s, ]
varDT(y, pik, x)
# As expected, the total of the variable of interest is perfectly estimated.
# strata argument
n < 100
H < 2
pik < runif(n)
y < rnorm(n)
strata < letters[sample.int(H, n, replace = TRUE)]
all.equal(
varDT(y, pik, strata = strata),
varDT(y[strata == "a"], pik[strata == "a"]) + varDT(y[strata == "b"], pik[strata == "b"])
)
# precalc argument
n < 1000
H < 50
pik < runif(n)
y < rnorm(n)
strata < sample.int(H, n, replace = TRUE)
precalc < varDT(y = NULL, pik, strata = strata)
identical(
varDT(y, precalc = precalc),
varDT(y, pik, strata = strata)
)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.