Nearest Neighbour Imputation with Mahalanobis distance

Description

POEM takes into account missing values, outlier indicators, error indicators and sampling weights.

Usage

1
2
POEM(data, weights, outind, errors, missing.matrix, alpha = 0.5, beta = 0.5, 
reweight.out = FALSE, c = 5, preliminary.mean.imputation = FALSE, monitor=FALSE)

Arguments

data

a data frame or matrix with the data

weights

sampling weights

outind

an indicator vector for the outliers, 1 indicating outlier

errors

matrix of indicators for items which failed edits

missing.matrix

the missingness matrix can be given as input. Otherwise it will be recalculated

alpha

scalar giving the weight attributed to an item that is failing

beta

minimal overlap to accept a donor

reweight.out

if TRUE the outliers are redefined

c

tuning constant when redefining the outliers (cutoff for Mahalanobis distances)

preliminary.mean.imputation

assume the problematic observation is at the mean of good observations

monitor

if TRUE verbose output

Details

POEM assumes that an multivariate outlier detection has been carried out beforehand and assumes the result is summarized in the vectore outind. In addition further observations may have been flagged as failing edit-rules and this information is given in the vector error. The mean and covariance estimate is calculated with the good observations (not outliers and downweighted errors). Preliminary mean imputation is sometimes needed to avoid a non-positive definite covariance estimate at this stage. Preliminary mean imputation assumes that the problematic values of an observation (with errors, outliers or missing) can be replaced by the mean of the rest of the non-problematic observations. Note that the algorithm imputes these problematic observations afterwards and therefore the final covariance matrix with imputed data is not the same as the working covariance matrix (which may be based on prelminary mean imputation).

Value

Function winsimp returns a list whose first component output is a sub-list with the follwing components:

preliminary.mean.imputation

logical. T if preliminary mean imputation should be used

completely.missing

number of observations with no observed values

good.values

weighted number of of good values (not missing, not outlying, not erroneous)

nonoutliers.before

number of nonoutliers before reweighting

weighted.nonoutliers.before

weighted number of nonoutliers before reweighting

nonoutliers.after

number of nonoutliers after reweighting

weighted.nonoutliers.after

weighted number of nonoutliers after reweighting

old.center

coordinate means after weighting, before imputation

old.variances

coordinate variances after weighting, before imputation

new.center

coordinate means after weighting, after imputation

new.variances

coordinate variances after weighting, after imputation

covariance

covariance (of standardised observations) before imputation

imputed.observations

indices of observations with imputated values

donors

indices of donors for imputed observations

new.outind

indices of new outliers

The further component returned by POEM is

imputed.data

Imputed data set.

Author(s)

Beat Hulliger

References

B\'eguin, C. and Hulliger B., (2002), EUREDIT Workpackage x.2 D4-5.2.1-2.C Develop and evaluate new methods for statistical outlier detection and outlier robust multivariate imputation, Technical report, EUREDIT 2002.

Examples

1
2
3
4
5
6
7
data(bushfirem)
data(bushfire.weights)
outliers<-rep(0,nrow(bushfirem))
outliers[31:38]<-1
imp.res<-POEM(bushfirem,bushfire.weights,outliers,prel=TRUE)
print(imp.res$output)
var(imp.res$imputed.data)