View source: R/define_variance_wrapper.R
define_variance_wrapper  R Documentation 
Given a variance estimation function (specific to a
survey), define_variance_wrapper
defines a variance estimation
wrapper easier to use (e.g. automatic domain estimation,
linearization).
define_variance_wrapper(
variance_function,
reference_id,
reference_weight,
default_id = NULL,
technical_data = NULL,
technical_param = NULL,
objects_to_include = NULL
)
variance_function 
An R function. It is the methodological workhorse of the variance estimation: from a set of arguments including the variables of interest (see below), it should return a vector of estimated variances. See Details. 
reference_id 
A vector containing the ids of all the responding units
of the survey. It can also be an unevaluated expression (enclosed in

reference_weight 
A vector containing the reference weight of the survey.
It can also be an unevaluated expression (enclosed in 
default_id 
A character vector of length 1, the name of the default
identifying variable in the survey file. It can also be an unevaluated
expression (enclosed in 
technical_data 
A named list of technical data needed to perform
the variance estimation (e.g. sampling strata, first or secondorder
probabilities of inclusion, estimated response probabilities, calibration
variables). Its names should match the names of the corresponding arguments
in 
technical_param 
A named list of technical parameters used to control
some aspect of the variance estimation process (e.g. alternative methodology).
Its names should match the names of the corresponding arguments in 
objects_to_include 
(Advanced use) A character vector indicating the name of additional R objects to include within the variance wrapper. 
Defining variance estimation wrappers is the key feature of
the gustave
package. It is the workhorse of the readytouse
qvar
function and should be used directly to handle more complex
cases (e.g. surveys with several stages or balanced sampling).
Analytical variance estimation is often difficult to carry out by nonspecialists owing to the complexity of the underlying sampling and estimation methodology. This complexity yields complex variance estimation functions which are most often only used by the sampling expert who actually wrote them. A variance estimation wrapper is an intermediate function that is "wrapped around" the (complex) variance estimation function in order to provide the nonspecialist with userfriendly features (see examples):
calculation of complex statistics (see
standard statistic wrappers
)
domain estimation
handy evaluation and factor discretization
define_variance_wrapper
allows the sampling expert to define a
variance estimation wrapper around a given variance estimation function and
set its default parameters. The produced variance estimation wrapper is
standalone in the sense that it contains all technical data necessary
to carry out the estimation (see technical_data
).
The arguments of the variance_function
fall into three types:
the data argument (mandatory, only one allowed): the numerical matrix of variables of interest to apply the variance estimation formula on
technical data arguments (optional, one or more allowed): technical and methodological information used by the variance estimation function (e.g. sampling strata, first or secondorder probabilities of inclusion, estimated response probabilities, calibration variables)
technical parameters (optional, one or more allowed): nondata arguments to be used to control some aspect of the variance estimation (e.g. alternative methodology)
technical_data
and technical_param
are used to determine
which arguments of variance_function
relate to technical information,
the only remaining argument is considered as the data argument.
An R function that makes the estimation of variance based on the provided variance function easier. Its parameters are:
data
: one or more calls to a statistic wrapper (e.g. total()
,
mean()
, ratio()
). See examples and
standard statistic wrappers
) and
standard statistic wrappers
)
where
: a logical vector indicating a domain on which the
variance estimation is to be performed
by
: q qualitative
variable whose levels are used to define domains on which the variance
estimation is performed
alpha
: a numeric vector of length 1
indicating the threshold for confidence interval derivation (0.05
by
default)
display
: a logical verctor of length 1 indicating
whether the result of the estimation should be displayed or not
id
: a character vector of size 1 containing the name of the
identifying variable in the survey file. Its default value depends on the
value of default_id
in define_variance_wrapper
envir
: an environment containing a binding to data
Martin Chevalier
Rao, J.N.K (1975), "Unbiased variance estimation for multistage designs", Sankhya, C n°37
qvar
, standard statistic wrappers
, varDT
### Example from the Labour force survey (LFS)
# The (simulated) Labour force survey (LFS) has the following characteristics:
#  first sampling stage: balanced sampling of 4 areas (each corresponding to
# about 120 dwellings) on firstorder probability of inclusion (proportional to
# the number of dwellings in the area) and total annual income in the area.
#  second sampling stage: in each sampled area, simple random sampling of 20
# dwellings
#  neither nonresponse nor calibration
# As this is a multistage sampling design with balanced sampling at the first
# stage, the qvar function does not apply. A variance wrapper can nonetheless
# be defined using the core define_variance_wrapper function.
# Step 1 : Definition of the variance function and the corresponding technical data
# In this context, the variance estimation function specific to the LFS
# survey can be defined as follows:
var_lfs < function(y, ind, dwel, area){
variance < list()
# Variance associated with the sampling of the dwellings
y < sum_by(y, ind$id_dwel)
variance[["dwel"]] < var_srs(
y = y, pik = dwel$pik_dwel, strata = dwel$id_area,
w = (1 / dwel$pik_area^2  dwel$q_area)
)
# Variance associated with the sampling of the areas
y < sum_by(y = y, by = dwel$id_area, w = 1 / dwel$pik_dwel)
variance[["area"]] < varDT(y = y, precalc = area)
Reduce(`+`, variance)
}
# where y is the matrix of variables of interest and ind, dwel and area the technical data:
technical_data_lfs < list()
# Technical data at the area level
# The varDT function allows for the precalculation of
# most of the methodological quantities needed.
technical_data_lfs$area < varDT(
y = NULL,
pik = lfs_samp_area$pik_area,
x = as.matrix(lfs_samp_area[c("pik_area", "income")]),
id = lfs_samp_area$id_area
)
# Technical data at the dwelling level
# In order to implement Rao (1975) formula for twostage samples,
# we associate each dwelling with the diagonal term corresponding
# to its area in the firststage variance estimator:
lfs_samp_dwel$q_area < with(technical_data_lfs$area, setNames(diago, id))[lfs_samp_dwel$id_area]
technical_data_lfs$dwel < lfs_samp_dwel[c("id_dwel", "pik_dwel", "id_area", "pik_area", "q_area")]
# Technical data at the individual level
technical_data_lfs$ind < lfs_samp_ind[c("id_ind", "id_dwel", "sampling_weight")]
# Test of the variance function var_lfs
y < matrix(as.numeric(lfs_samp_ind$unemp), ncol = 1, dimnames = list(lfs_samp_ind$id_ind))
with(technical_data_lfs, var_lfs(y = y, ind = ind, dwel = dwel, area = area))
# Step 2 : Definition of the variance wrapper
# Call of define_variance_wrapper
precision_lfs < define_variance_wrapper(
variance_function = var_lfs,
technical_data = technical_data_lfs,
reference_id = technical_data_lfs$ind$id_ind,
reference_weight = technical_data_lfs$ind$sampling_weight,
default_id = "id_ind"
)
# Test
precision_lfs(lfs_samp_ind, unemp)
# The variance wrapper precision_lfs has the same features
# as variance wrappers produced by the qvar function (see
# qvar examples for more details).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.