UWE: Unequal Weighting Effect
In DiegoZardetto/ReGenesees: R Evolved Generalized Software for Sampling Estimates and Errors in Surveys

UWE	R Documentation

Unequal Weighting Effect

Description

Computes the Unequal Weighting Effect for the current and initial weights of a design object.

Usage

UWE(design, by = NULL)

Arguments

`design`	Object of class `analytic` (or inheriting from it).
`by`	Formula specifying variables that define "estimation domains". If `NULL` (the default option) the UWE refer to the whole sample.

Details

Function UWE computes the Unequal Weighting Effect for the current (w) and initial (w0) weights of a design object, plus the corresponding variance inflation (or deflation) factor (UWE(w) / UWE(w0)) induced by changing the weights from w0 to w (w0 -> w).

Following Kish's definition [Kish 92], the UWE is calculated as 1 plus the relative sample variance of the weights: UWE(w) = 1 + RelVar(w).

The current weights, w, of design are the weights that would be returned by weights(design) and would be used for estimation purposes by functions svystatTM, svystatR, etc.

The initial weights, w0, of design depend on the nature of object design:

If design is the outcome of a ‘weight-changing pipeline’, w0 -> w1 -> ... -> w, i.e. it was obtained by the application of an arbitrary chain of ReGenesees functions that modify the weights (e.g. smooth.strat.jump, e.calibrate, ext.calibrated, trimcal, ...), then the initial weights, w0, are the weights of the starting design object in the pipeline.
If design is an initial design object generated by function e.svydesign, then the initial weights, w0, are taken as equal to current weights, w0 = w.

Note that, when design is the outcome of a ‘weight-changing pipeline’, function UWE provides a measure of the overall, cumulative impact of all the adjustments the weights underwent throughout the pipeline.

To assess the effect, in terms of UWE and variance inflation, of just a single processing step of the pipeline, you can call function UWE on the input and output designs of that step and compare the results (basically, by taking suitable ratios).

Value

A data.frame, with one single row (if by = NULL) or one row for each domain (if by is passed), and the following columns:

  Column        Meaning
  UWE.curr......Current Unequal Weighting Effect
  UWE.ini.......Initial Unequal Weighting Effect
  VAR.infl......Variance Inflation Factor ( UWE.curr / UWE.ini )

Methodological Remark

Kish's UWE is a model-based tool that can be useful for diagnostic purposes. However, its values must be interpreted with some caution, exactly as it is necessary to do for model-based estimates of Kish's Deff.

In particular, UWE is - by construction - only sensitive to variations of the sample variance of the weights. Therefore, it is unable to discriminate weight adjustments which, despite adding variability to the weights at sample level, might result in reductions of the sampling variance for some estimators. This is often the case of calibration, which may well make survey weights more unequal, but nonetheless cause their reciprocals to become more correlated to some interest variables. Similar considerations hold for stratified sampling, to the extent that, with respect to the interest variables, units tend to be more similar within strata than between strata.

In any case, the UWE can turn out handy when comparing the potential outcomes of performing the same kind of weight adjustment under slightly different settings (e.g. calibration with different bounds or distance functions, trimming with different thresholds, etc.).

Author(s)

Diego Zardetto

References

Kish, L. (1992). Weighting for unequal Pi. Journal of Official Statistics, 8, 183-200.

Examples

###############################################
# Compute the UWE along the following example #
# of weight-changing pipeline:                #
# 1) Smooth for stratum jumpers               #
# 2) Adjust for nonresponse                   #
# 3) Calibrate to known population totals     #
# 4) Consistently trim calibration weights    #
#                                             #
# NOTE: To perform 1) and 2) I will first     #
#       A) simulate some stratum jumpers.     #
#       B) simulate some nonresponse.         #
###############################################

## Load sbs data:
data(sbs)

## -- A) Simulate stratum jumpers
# Create the strata variable observed at survey time by cloning the
# strata variable at sampling time
sbs$curr.strata <- sbs$strata

# Now inject some (say ~250) random stratum jumpers
set.seed(12345)          # (fix the RNG seed for reproducibility)
sbs$curr.strata[sample(1:nrow(sbs), 250)] <- sbs$curr.strata[sample(1:nrow(sbs), 250)]

# Resulting number of stratum jumpers:
tt <- table(sbs$strata, sbs$curr.strata)
sum(tt[row(tt) != col(tt)])

## -- B) Simulate nonresponse
# Assume a response propensity that increases with enterprise size (as
# measured by number of employees)
levels(sbs$emp.cl)
p.resp <- c(.4, .6, .8, .95, .99)

# Tie response probabilities to sample observations:
pr <- p.resp[unclass(sbs$emp.cl)]

# Now, randomly select a subsample of responding units from sbs:
set.seed(12345)          # (fix the RNG seed for reproducibility)
rand <- runif(1:nrow(sbs))
sbs.r <- sbs[rand < pr, ]

# This implies an overall response rate of about 73%:
nrow(sbs.r) / nrow(sbs)

## -- 0) Create the respondent design object
# NOTE: I'll keep using the original fpc column for the sake of the examples,
#       but they should be recomputed in real applications...
sbsdes<-e.svydesign(data=sbs.r,ids=~id,strata=~strata,weights=~weight,fpc=~fpc)

## -- 1) Smooth for stratum jumpers
# Use method 'MinChange'
sbssmooth <- smooth.strat.jump(sbsdes, ~curr.strata)

# Have a look
sbssmooth

## -- 2) Adjust for nonresponse
# Use a simple Response Homogeneity Model approach, with size classes
# as RHGs. Perform the RHG weight adjustment via calibration 

# Compute enterprise counts by size classes from the frame
N.RHG <- pop.template(sbssmooth, calmodel= ~emp.cl - 1)
N.RHG <- fill.template(sbs.frame, N.RHG)

# Calibrate to achieve the RHG adjustment
sbsRHG <- e.calibrate(sbssmooth, N.RHG)

# Have a look
sbsRHG

# -- 3) Calibrate to known population totals
# Now calibrate again in order to reduce estimators variance, by using further
# available auxiliary information, e.g. the total number of employees (emp.num)
# and enterprises (ent) inside the domains obtained by crossing nace.macro
# and region:
pop <- pop.template(sbsRHG, calmodel = ~emp.num + ent-1,
                    partition = ~nace.macro:region)
pop <- fill.template(sbs.frame, pop)

# Calibrate to improve estimation efficiency
sbscal <- e.calibrate(sbsRHG, pop)

# Have a look
sbscal

# -- 4) Consistently trim calibration weights
# Say one wants to avoid weights that are less then 1 and above 50:
sbstrim <- trimcal(sbscal, c(1, 50))

# Have a look
sbstrim

## -- UWE calculation along the weights-changing pipeline
# Object sbstrim is the output of the weights-changing pipeline, as
# one easily recognizes when printing it:
sbstrim

# UWE of initial object
UWE(sbsdes)

# UWE at step 1), i.e. smoothing for stratum jumpers
UWE(sbssmooth)

# UWE of step 2), i.e. nonresponse RHG adjustment
UWE(sbsRHG)

# UWE at step 3), i.e. calibration for efficiency improvement
UWE(sbscal)

# UWE at step 4), i.e. consistent trimming of calibration weights
UWE(sbstrim)

# End

DiegoZardetto/ReGenesees documentation built on Dec. 16, 2024, 2:03 p.m.