winsimp: Winsorization followed by imputation

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Winsorisation of outliers according to the Mahalanobis distance followed by an imputation under the multivariate normal model. Only the outliers are winsorized. The Mahalanobis distance MDmiss allows for missing values.

Usage

1
Winsimp(data, center, scatter, outind, seed = 1000003)

Arguments

data

Data frame with the data

center

(Robust) estimate of the center (location) of the observations

scatter

(Robust) estimate of the scatter (covariance-matrix) of the observations

outind

Logical vector indicating outliers with 1 or TRUE for outliers

seed

Seed for random number generator

Details

It is assumed that center, scatter and outind stem from a multivariate outlier detection algorithm which produces robust estimates and which declares outliers observations with a large Mahalanobis distance. The cutpoint is calculated as the least (unsquared) Mahalanobis distance among the outliers. The winsorization reduces the weight of the outliers:

y_i=μ_R +(y_i-μ_R)*c/d_i

, where μ_R is the robust center and d_i is the (unsquared) Mahalanobis distance of observation i.

Value

Function winsimp returns a list whose first component output is a sub-list with the follwing components:

cutpoint

Cutpoint for outliers

proc.time

Processing time

n.missing.before

Number of missing values before

n.missing.after

Number of missing values after imputation

The further component returned by winsimp is

imputed.data

Imputed data set.

Author(s)

Beat Hulliger

References

Hulliger, B. (2007) Multivariate Outlier Detection and Treatment in Business Surveys, Proceedings of the III International Conference on Establishment Surveys, Montr\'eal.

See Also

MDmiss. Uses imp.norm from the norm package.

Examples

1
2
3
4
data(bushfirem,bushfire.weights)
det.res<-TRC(bushfirem,weight=bushfire.weights)
imp.res<-Winsimp(bushfirem,det.res$output$center,det.res$output$scatter,det.res$outind)
print(imp.res$output)

modi documentation built on May 2, 2019, 6:48 p.m.