get.residuals: Calibration Residuals of Interest Variables
In DiegoZardetto/ReGenesees: R Evolved Generalized Software for Sampling Estimates and Errors in Surveys

get.residuals

R Documentation

Calibration Residuals of Interest Variables

Description

Computes (scaled) residuals of a set of interest variables w.r.t. the calibration model adopted to build a calibrated object.

Usage

get.residuals(cal.design, y, scale = c("no", "w", "d", "g"))

Arguments

`cal.design`	Object of class `cal.analytic`.
`y`	Formula defining the variables of interest.
`scale`	`character` specifying how to scale the residuals, can be one of: `'no'` (the default), `'w'`, `'d'`, `'g'` (see ‘Details’).

Details

This function has been designed mainly for programmers or advanced users willing to build upon ReGenesees: typical users are not expected to feel much need of it.

The residuals of an interest variable w.r.t. the linear model defined by the auxiliary variables used for calibration play a central role in estimating the variance of Calibration Estimators (see, e.g., [Deville, 1999]). Notice that if object cal.design has been generated by running a partitioned calibration task (see e.calibrate), the residuals will be correctly computed using the different estimated regression coefficients pertaining to the different domains belonging to the partition.

The mandatory argument y behaves exactly the same way as it does in function svystatTM.

The scale argument allows to scale the computed residuals by multiplying them by different factors. By default scale="no", that is unscaled residuals are returned. Value "w" returns the residuals times the calibrated weights; value "d" returns the residuals times the initial weights; finally, value "g" returns the residuals times the g-weights (i.e. the ratios between calibrated and initial weights). Notice that the semantics of argument scale are slightly modified when the input object cal.design has been obtained by a multi-step calibration procedure (see Section 'Note' below).

Value

A matrix of residuals.

Note

If cal.design has undergone k subsequent calibration steps (with k >= 1), the function will return the residuals computed w.r.t. the linear assisting model underlying the last (i.e. k-th) calibration step. If k >= 2, the scale parameter will be interpreted as follows:

SCALE        MEANING
 "no"........no scale;
  "w"........last calibration weight (i.e. at step k);
  "d"........second to last calibration weight (i.e. at step k - 1);
  "g"........ratio between last and second to last calibration weights.

Author(s)

Diego Zardetto

References

Deville, J. C. (1999) “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.”. Survey Methodology 25: 193-203.

Examples

##################################################################
# Just some checks on the consistency of the numerical results   #
# obtained by ReGenesees with well known theoretical properties. #
##################################################################

# Load sbs data:
data(sbs)

# Create a design object to be calibrated:
sbsdes<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,fpc=~fpc)

################
## Property 1 ##
################
## If the calibration model has (implicitly or explicitly) an intercept
## the weighted sum of residuals must be zero.

# Suppose you calibrate on enterprise counts inside areas, i.e. a calibration
# model WITH intercept (though implicitly):
# calmodel= ~ area - 1

# First, build and fill the known totals template:
pop<-pop.template(sbsdes, calmodel= ~ area - 1)
pop<-fill.template(pop, universe=sbs.frame)

# Then, calibrate:
sbscal<-e.calibrate(sbsdes, pop)

# Now, get the residuals of any variable (e.g. y and emp.num) scaled with the
# direct weights:
de <- get.residuals(sbscal, ~ y + emp.num, scale="d")

# Lastly, compute the column sums...
colSums(de)

#...which is (numerically) zero, as it must be.


################
## Property 2 ##
################
## If the calibration model does not have (implicitly or explicitly) an
## intercept term the weighted sum of residuals is generally different from zero. 

# Suppose you calibrate on employees counts inside areas, i.e. a calibration
# model WITHOUT intercept:
# calmodel= ~ emp.num:area - 1

# First, build and fill the known totals template:
pop<-pop.template(sbsdes, calmodel= ~ emp.num:area - 1)
pop<-fill.template(pop, universe=sbs.frame)

# Then, calibrate:
sbscal<-e.calibrate(sbsdes, pop)

# Now, get the residuals of any variable (e.g. y and region) scaled with the
# direct weights:
de <- get.residuals(sbscal, ~ y + region, scale="d")

# Lastly, compute the column sums...
colSums(de)

#...which is far from zero, as expected.


################
## Property 3 ##
################
## In the Taylor linearization approach, estimating the variance of a
## Calibration Estimator amounts to estimating the variance of the HT total
## of its linearized variable (i.e. the g-scaled residual), under the sampling
## design at hand.

# Suppose you calibrate on the total number of employees and enterprises
# inside the domains obtained by:
#  i) crossing nace.macro and region;
# ii) crossing emp.cl and region;

# First, build and fill the population totals template:
pop<-pop.template(sbsdes,
     calmodel=~(emp.num+ent):(nace.macro+emp.cl)-1,
     partition=~region)
pop<-fill.template(universe=sbs.frame,template=pop)

# Then, calibrate:
sbscal<-e.calibrate(sbsdes,pop)

# Now, compute the linearized variable of the Calibration Estimator of the
# total of any variable (e.g. va.imp2):
z_va.imp2 <- get.residuals(sbscal, ~ va.imp2, scale="g")

# Now, treat z_va.imp2 as an ordinary variable and compute the standard error
# of its HT total:
sbsdes<-des.addvars(sbsdes, z_va.imp2 = z_va.imp2)
SE(svystatTM(sbsdes, ~z_va.imp2))

# Lastly, compute directly the standard error of the Calibration Estimator...
SE(svystatTM(sbscal, ~va.imp2))

#...and they are identical, as it must be.


# Obviously the same result would hold for domain estimates (e.g. the total
# of va.imp2 for the "Agriculture" nace.macro).

# Compute the linearized variable of the Calibration Estimator of the domain
# total:
z_va.imp2.Agr <- get.residuals(sbscal, ~ I(va.imp2*(nace.macro=="Agriculture")),
                               scale="g")

# Now, treat z_va.imp2.Agr as an ordinary variable and compute the standard
# error of its HT total:
sbsdes<-des.addvars(sbsdes, z_va.imp2.Agr = z_va.imp2.Agr)
SE(svystatTM(sbsdes, ~z_va.imp2.Agr))

# Lastly, compute directly the standard error of the Calibration Estimator
# by domain...
SE(svystatTM(sbscal, ~va.imp2, ~nace.macro))

#...and the "Agriculture" SEs are identical, as it must be.

DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.