optimal.onesided.cutoff: Optimal one-sided winsorization for survey outliers

Description Usage Arguments Details Value References Examples

Description

This function calculates optimal tuning parameter, cutoffs, and winsorized values for one-sided winsorization.

Usage

1
2
optimal.onesided.cutoff(formula, surveydata, historical.reweight = 1,
  estimated.means.name = "", stop = F)

Arguments

formula

The regression formula (e.g. income ~ employment + old.turnover if income is survey variable and employment and old.turnover are auxiliary variables).

surveydata

A data frame of the survey data including the variables in formula, piwt (inverse probability of selection), gregwt (generalized regression estimator weight) and regwt (weight to be used in regression - will be set to 1 if missing).

historical.reweight

A set of reweighting factors for use when a historical dataset is being used. It reweights from the historical sample to the sample of interest. The default value of 1 should be used if the sample being used for optimising Q is the same sample (or at least the same design) as the sample to which the winsorizing cutoffs are to be applied.

estimated.means.name

The variable of this name in surveydata should contain an estimator of the expected values for each sample value of the variable of interest. If set to "", the regression model is estimated using IRLS.

stop

Set to T to open a browser window (for debugging purposes)

Details

This function calculates optimal one-sided cutoffs for winsorization where regression residuals are truncated at Q / (generalized_regression_estimator_weight-1) and Q satisfies the optimality result in Kokic and Bell (1994) and Clark (1995).

Value

A list consisting of Q.opt (the optimal Q), rlm.coef (the robust regression coefficients), windata which is a dataset containing the same observations and variables as surveydata in the same order, with additional variables cutoffs (the winsorizing cutoffs for each unit in sample), y (the values of the variable of interest), win1.values (the type 1 winsorized values of interest, i.e. the minimums of the cutoff and y) and win2.values (the type 2 winsorized values of interest, so that sum(surveydata$gregwt*win2.values) is the winsorized estimator.

References

Clark, R. G. (1995), "Winsorisation methods in sample surveys," Masters thesis, Australian National University, http://hdl.handle.net/10440/1031.

Kokic, P. and Bell, P. (1994), "Optimal winsorizing cutoffs for a stratified finite population estimator," J. Off. Stat., 10, 419-435.

Examples

1
2
test <- optimal.onesided.cutoff(formula=y~x1+x2,surveydata=survdat.example)
plot(test$windata$y,test$windata$win1.values)

Example output



surveyoutliers documentation built on May 2, 2019, 2:44 p.m.