ext.calibrated: Make ReGenesees Digest Externally Calibrated Weights
In DiegoZardetto/ReGenesees: R Evolved Generalized Software for Sampling Estimates and Errors in Surveys

ext.calibrated

R Documentation

Make ReGenesees Digest Externally Calibrated Weights

Description

Enables ReGenesees to provide correct variance estimates of (functions of) calibration estimators, even if the survey weights have not been calibrated by ReGenesees.

Usage

ext.calibrated(data, ids, strata = NULL, weights,
               fpc = NULL, self.rep.str = NULL, check.data = TRUE,
               weights.cal, calmodel, partition = FALSE, sigma2 = NULL)

Arguments

`data`	The same as in function `e.svydesign`.
`ids`	The same as in function `e.svydesign`.
`strata`	The same as in function `e.svydesign`.
`weights`	The same as in function `e.svydesign`.
`fpc`	The same as in function `e.svydesign`.
`self.rep.str`	The same as in function `e.svydesign`.
`check.data`	The same as in function `e.svydesign`.
`weights.cal`	Formula identifying the externally calibrated weights.
`calmodel`	The same as in function `e.calibrate`.
`partition`	The same as in function `e.calibrate`.
`sigma2`	The same as in function `e.calibrate`.

Details

Owing to ReGenesees's ability to provide proper variance estimates for (complex functions of) calibration estimators, some users may be tempted to exploit ReGenesees in the estimation phase even if they did not use ReGenesees for calibration.

This result cannot be achieved naively, by simply passing to ReGenesees function e.svydesign the survey data and supplying the externally calibrated weights through its weights argument.

Indeed, variance estimation methods of ReGenesees's summary statistics functions (svystatTM, svystatR, svystatS, svystatSR, svystatB, svystatQ, svystatL and svystat) are dispatched according to the class of the input design object:

If the design object is un-calibrated (i.e. its class is ‘analytic’), variance formulas are appropriate to Horvitz-Thompson estimators (and functions of them).
If the design object is calibrated (i.e. its class is ‘cal.analytic’), variance formulas are appropriate to Calibration estimators (and functions of them).

Therefore, the naive approach of passing the externally calibrated weights weights.cal to e.svydesign as if they were initial or design weights cannot succeed, since it would result in HT-like variance estimates, leading generally to variance overestimation (with bigger upward bias for variables that are better explained by the calibration model).

Function ext.calibrated has been designed exactly to avoid the aforementioned pitfalls and to allow ReGenesees provide correct variance estimates of (functions of) calibration estimators, even if the survey weights have been calibrated externally by other software.

Argument weights.cal identifies the externally calibrated weights of the units included in the sample. The data variable referenced by weights.cal must be numeric. Currently, only positive externally calibrated weights can be handled (see the dedicated section below).

Other arguments to ext.calibrated derive either from function e.svydesign or from function e.calibrate. The former serve the purpose of passing the survey data and the corresponding sampling design metadata, the latter are meant to tell ext.calibrated how the externally calibrated weights have been obtained.

Value

An object of class cal.analytic, storing the original survey data plus all the sampling design and calibration metadata needed for proper variance estimation.

What if externally calibrated weights happen to be negative?

From a methodological perspective, negative calibration weights are legitimate. However, owing to software implementation details whose modification would not be trivial, function ext.calibrated is not yet able to cope with this case. Note that the problem is actually due to the external origin of the negative calibration weights. In fact, ReGenesees calibration and estimation facilities are entirely able to cope with possibly negative calibration weights, provided they were computed internally.

Note

Exactly as ReGenesees's base functions e.svydesign and e.calibrate would do, ext.calibrated too will wrap inside its return value a local copy of data. As usual, this copy will be stored inside the variables slot of the output list. As usual, again, the calibrated weights will be accessible by using the weights function.

Author(s)

Diego Zardetto.

Examples


# Load data sbs data
data(sbs)

#########################################################################
# Simulate an external calibration procedure and compute some benchmark #
# estimates and errors to test function ext.calibrated                  #
#########################################################################
# Define a survey design
sbsdes <- e.svydesign(data= sbs, ids= ~id, strata= ~strata, weights= ~weight,
                      fpc= ~fpc)

# Build a template for population totals
pop <- pop.template(data= sbsdes, calmodel= ~y:nace.macro + emp.cl + emp.num - 1,
                    partition= ~dom3)

# Have a look at the template structure
pop.desc(pop)

# Fill the template
pop <- fill.template(universe= sbs.frame, template= pop)

# Calibrate
sbscal <- e.calibrate(design= sbsdes, df.population= pop, calfun= "logit",
                      bounds= c(0.8, 1.3), sigma2= ~ emp.num)

# Compute benchmark estimates and errors (average value added per employee by
# region) to be later compared with those obtained by using ext.calibrated 
benchmark <- svystatR(design= sbscal, num= ~va.imp2, den= ~emp.num, by= ~region)
benchmark

# Extract the 'externally' calibrated weights...
w <- weights(sbscal)

#...and add these 'externally' calibrated weights to the original survey data
sbs.ext <- data.frame(sbs, w.ext = w)

# NOTE: Now sbs.ext is just a data frame, without any knowledge of the
#       calibration metadata formerly stored inside sbscal (i.e. the object
#       calibrated by ReGenesees)


##############################################################
# Let ReGenesees digest the 'externally' calibrated weights, #
# then re-compute benchmark estimates and errors for testing #
##############################################################
# Simply pass survey data along with sampling design and calibration model
# metadata
sbscal.ext <- ext.calibrated(data= sbs.ext, ids= ~id, strata= ~strata,
                             weights= ~weight, fpc = ~fpc,
                             weights.cal= ~w.ext,
                             calmodel= ~y:nace.macro + emp.cl + emp.num - 1,
                             partition= ~dom3, sigma2= ~emp.num)

# Have a look at the output
sbscal.ext

# Now re-compute benchmark estimates and errors by means of new object
# ext.sbscal
test <- svystatR(design= sbscal.ext, num= ~va.imp2, den= ~emp.num, by= ~region)
test

################################################################
# Compare benchmark estimates and errors to those derived from #
# ext.calibrated return object                                 #
################################################################
benchmark
test

# ...and they are identical, as it must be.

# NOTE: All utility tools yield exactly the same results, e.g.
identical(weights(sbscal), weights(sbscal.ext))
identical(g.range(sbscal), g.range(sbscal.ext))


##########################################################################
# Show that the naive idea of directly passing the externally calibrated #
# weights to e.svydesign does NOT work properly for variance estimation  #
##########################################################################
naive <- e.svydesign(data= sbs.ext, ids= ~id, strata= ~strata,
                     weights= ~w.ext, fpc = ~fpc)

# Estimated sampling errors derived by this naive design object...
svystatR(design= naive, num= ~va.imp2, den= ~emp.num, by= ~region)

#...do NOT match benchmark values, overestimating them:
benchmark

DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.