POEM: Nearest Neighbour Imputation with Mahalanobis distance
In modi: Multivariate outlier detection and imputation for incomplete survey data

Description Usage Arguments Details Value Author(s) References Examples

POEM takes into account missing values, outlier indicators, error indicators and sampling weights.

1 2	POEM(data, weights, outind, errors, missing.matrix, alpha = 0.5, beta = 0.5, reweight.out = FALSE, c = 5, preliminary.mean.imputation = FALSE, monitor=FALSE)

`data`	a data frame or matrix with the data
`weights`	sampling weights
`outind`	an indicator vector for the outliers, 1 indicating outlier
`errors`	matrix of indicators for items which failed edits
`missing.matrix`	the missingness matrix can be given as input. Otherwise it will be recalculated
`alpha`	scalar giving the weight attributed to an item that is failing
`beta`	minimal overlap to accept a donor
`reweight.out`	if `TRUE` the outliers are redefined
`c`	tuning constant when redefining the outliers (cutoff for Mahalanobis distances)
`preliminary.mean.imputation`	assume the problematic observation is at the mean of good observations
`monitor`	if `TRUE` verbose output

POEM assumes that an multivariate outlier detection has been carried out beforehand and assumes the result is summarized in the vectore outind. In addition further observations may have been flagged as failing edit-rules and this information is given in the vector error. The mean and covariance estimate is calculated with the good observations (not outliers and downweighted errors). Preliminary mean imputation is sometimes needed to avoid a non-positive definite covariance estimate at this stage. Preliminary mean imputation assumes that the problematic values of an observation (with errors, outliers or missing) can be replaced by the mean of the rest of the non-problematic observations. Note that the algorithm imputes these problematic observations afterwards and therefore the final covariance matrix with imputed data is not the same as the working covariance matrix (which may be based on prelminary mean imputation).

Function winsimp returns a list whose first component output is a sub-list with the follwing components:

`preliminary.mean.imputation`	logical. `T` if preliminary mean imputation should be used
`completely.missing`	number of observations with no observed values
`good.values`	weighted number of of good values (not missing, not outlying, not erroneous)
`nonoutliers.before`	number of nonoutliers before reweighting
`weighted.nonoutliers.before`	weighted number of nonoutliers before reweighting
`nonoutliers.after`	number of nonoutliers after reweighting
`weighted.nonoutliers.after`	weighted number of nonoutliers after reweighting
`old.center`	coordinate means after weighting, before imputation
`old.variances`	coordinate variances after weighting, before imputation
`new.center`	coordinate means after weighting, after imputation
`new.variances`	coordinate variances after weighting, after imputation
`covariance`	covariance (of standardised observations) before imputation
`imputed.observations`	indices of observations with imputated values
`donors`	indices of donors for imputed observations
`new.outind`	indices of new outliers

The further component returned by POEM is

imputed.data

Imputed data set.

Beat Hulliger

B\'eguin, C. and Hulliger B., (2002), EUREDIT Workpackage x.2 D4-5.2.1-2.C Develop and evaluate new methods for statistical outlier detection and outlier robust multivariate imputation, Technical report, EUREDIT 2002.

data(bushfirem)
data(bushfire.weights)
outliers<-rep(0,nrow(bushfirem))
outliers[31:38]<-1
imp.res<-POEM(bushfirem,bushfire.weights,outliers,prel=TRUE)
print(imp.res$output)
var(imp.res$imputed.data)