knitr::opts_chunk$set( collapse = TRUE, comment = "", prompt = TRUE, dpi = 36, fig.align = "center" )
```{css, echo = FALSE} .my-sidebar-orange { padding-left: 1.5rem; padding-top: 0.5rem; padding-bottom: 0.25rem; margin-top: 1.25rem; margin-bottom: 1.25rem; border: 1px solid #eee; border-left-width: 0.75rem; border-radius: .25rem; border-left-color: #ce5b00; }
.my-sidebar-blue { padding-left: 1.5rem; padding-top: 0.5rem; padding-bottom: 0.25rem; margin-top: 1.25rem; margin-bottom: 1.25rem; border: 1px solid #eee; border-left-width: 0.75rem; border-radius: .25rem; border-left-color: #1f618d; }
## Outline In this vignette, we discuss the robust Horvitz-Thompson (RHT) estimator of [Hulliger](#biblio) (1995, 1999). The vignette is organized as follows. - 1 Workplace data - 2 Robust Horvitz-Thompson estimator - References <div class="my-sidebar-blue"> <p style="color: #1f618d;"> **Good to know.** </p> <p>The RHT estimating method is available in two "flavors": - bare-bone method - survey method Bare-bone methods are stripped-down versions of the survey methods in terms of functionality and informativeness. These functions may serve users and package developers as building blocks. In particular, bare-bone functions cannot compute variances. See Vignette "Basic Robust Estimators" to learn more about other robust estimators (trimming, winsorization, etc.). </p> </div> First, we load the namespace of the package `robsurvey` and attach it to the search path. ```r library("robsurvey", quietly = TRUE)
The argument quietly = TRUE
suppresses the start-up message in the call of
library("robsurvey")
.
The workplace
sample consists of payroll data on n = 142 workplaces or
business establishments (with paid employees) in the retail sector of a
Canadian province.
workplace
data are not
those collected by Statistics Canada but have been generated by
Fuller (2009, Example 3.1.1, Table 6.3).The original weights of WES were about 2200 for the stratum of small workplaces, about 750 for medium-sized, and about 35 for large workspaces. Several strata containing very large workplaces were sampled exhaustively; see Patak et al. (1998).
attach(workplace)
The variable of interest is payroll
, and the goal is to estimate the
population payroll total in the retail sector (in Canadian dollars).
head(workplace, 3)
In order to use the survey methods (not bare-bone methods), we must
attach the survey
package (Lumley, 2010, 2021) to the search
path
library("survey")
suppressPackageStartupMessages(library(survey))
and specify a survey or sampling design object
dn <- svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat)
dn <- if (packageVersion("survey") >= "4.2") { svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace, calibrate.formula = ~-1 + strat) } else { svydesign(ids = ~ID, strata = ~strat, fpc = ~fpc, weights = ~weight, data = workplace) }
Note. Since version 4.2, the survey package allows the definition
of pre-calibrated weights (see argument calibrate.formula
of the function
svydesign()
). This vignette uses this functionality (in some places). If you
have installed an earlier version of the survey
package, this vignette will
automatically fall back to calling svydesign()
without the calibration
specification. See vignette Pre-calibrated
weights
of the survey
package to learn more.
survey_version <- packageVersion("survey") if (survey_version < "4.2") { cat(paste0('<div class="my-sidebar-orange">\n <p style="color: #ce5b00;"> **IMPORTANT: PRE-CALIBRATED WEIGHTS ARE NOT SUPPORTED** </p> This vignette has been built with version **', survey_version, '** of the **survey** package. Therefore, `svydesign()` is called without the `calibrate.formula` argument. As a consequence, some of the variance and standard error estimates may differ from those with pre-calibrated weights, i.e., the default specification.<p> </p> </div>')) }
To get a first impression of the distribution of payroll
, we examine two
(design-weighted) boxplots of payroll
(on raw and logarithmic scale). The
boxplots are obtained using function survey::svyboxplot
.
layout(matrix(1:2, ncol = 2)) par(mar = c(4, 1, 1, 1)) svyboxplot(payroll~ 1, dn, all.outliers = TRUE, xlab = "payroll", horizontal = TRUE) svyboxplot(log(payroll) ~ 1, dn, all.outliers = TRUE, xlab = "log(payroll)", horizontal = TRUE)
From the boxplot with payroll
on raw scale, we recognise that the sample
distribution of payroll
is skewed to the right; the boxplot on logarithmic
scale demonstrates that log-transform is not sufficient to turn the skewed
distribution into a symmetric distribution. The outliers need not be errors.
Following Chambers (1986), we distinguish representative outliers
from non-representative outliers ($\rightarrow$ see vignette "Basic Robust
Estimators" for an introduction to the notion of non-/ representative
outliers).
The outliers visible in the boxplot refer to a few large workplaces. Moreover, we assume that these outliers represent other workplaces in the population that are similar in value (i.e., representative outliers).
The following bare-bone estimating methods are available:
weighted_mean_huber()
weighted_total_huber()
weighted_mean_tukey()
weighted_total_tukey()
The functions with postfix _tukey
are M-estimators with the Tukey biweight
$\psi$-function. The Huber RHT M-estimator of the payroll total can be
computed with
weighted_total_huber(payroll, weight, k = 8, type = "rht")
Note that we must specify type = "rht"
for the RHT [the case type = "rhj"
is discussed in the vignette "Basic Robust Estimators"]. Here, we have chosen
the robustness tuning constant $k = 8$.
The following survey method are available;
svymean_huber()
svytotal_huber()
svymean_tukey()
svytotal_tukey()
The survey method of the RHT (and its standard error) is
m <- svytotal_huber(~payroll, dn, k = 8, type = "rht") m
The summary()
method summarizes the most important facts about the estimate.
summary(m)
The estimated location, variance, and standard error can be extracted from
object m
with the following commands.
coef(m) vcov(m) SE(m)
For M-estimators, the estimated scale (weighted MAD) can be extracted with
the scale()
function.
scale(m)
Additional utility functions are:
residuals()
to extract the residualsfitted()
to extract the fitted values under the model in userobweights()
to extract the robustness weightsIn the following figure, the robustness weights of object m
are plotted
against the residuals. The Huber RHT M-estimator downweights observations at
both ends of the residuals' distribution.
plot(residuals(m), robweights(m))
par(mar = c(5, 4, 1, 0)) plot(residuals(m), robweights(m))
An adaptive M-estimator of the total (or mean) is defined by letting the data chose the tuning constant $k$. This approach is available for the RHT estimator $\rightarrow$ see vignette "Basic Robust Estimators", Chap. 5.3 on M-estimators.
CHAMBERS, R. (1986). Outlier Robust Finite Population Estimation. Journal of the American Statistical Association 81, 1063–1069, DOI: 10.1080/01621459.1986.10478374.
FULLER, W. A. (2009). Sampling Statistics, Hoboken (NJ): John Wiley & Sons, DOI: 10.1002/9780470523551.
HULLIGER, B. (1995). Outlier Robust Horvitz–Thompson Estimators. Survey Methodology 21, 79–87.
HULLIGER, B. (1999). Simple and robust estimators for sampling, in: Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 54–63.
HULLIGER, B. (2006). Horvitz–Thompson Estimators, Robustified. In: Encyclopedia of Statistical Sciences ed. by Kotz, S. Volume 5, Hoboken (NJ): John Wiley and Sons, 2nd edition, DOI: 10.1002/0471667196.ess1066.pub2.
LUMLEY, T. (2010). Complex Surveys: A Guide to Analysis Using R: A Guide to Analysis Using R, Hoboken (NJ): John Wiley & Sons.
LUMLEY, T. (2021). survey: analysis of complex survey samples. R package version 4.0, URL https://CRAN.R-project.org/package=survey.
PATAK, Z., HIDIROGLOU, M. and LAVALLEE, P. (1998). The methodology of the Workplace and Employee Survey. Proceedings of the Survey Research Methods Section, American Statistical Association, 83–91.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.