POEM: Nearest Neighbour Imputation with Mahalanobis distance

View source: R/POEM.R

POEMR Documentation

Nearest Neighbour Imputation with Mahalanobis distance

Description

POEM takes into account missing values, outlier indicators, error indicators and sampling weights.

Usage

POEM(
  data,
  weights,
  outind,
  errors,
  missing.matrix,
  alpha = 0.5,
  beta = 0.5,
  reweight.out = FALSE,
  c = 5,
  preliminary.mean.imputation = FALSE,
  monitor = FALSE
)

Arguments

data

a data frame or matrix with the data.

weights

sampling weights.

outind

an indicator vector for the outliers with 1 indicating an outlier.

errors

matrix of indicators for items which failed edits.

missing.matrix

the missingness matrix can be given as input. Otherwise, it will be recalculated.

alpha

scalar giving the weight attributed to an item that is failing.

beta

minimal overlap to accept a donor.

reweight.out

if TRUE, the outliers are redefined.

c

tuning constant when redefining the outliers (cutoff for Mahalanobis distance).

preliminary.mean.imputation

assume the problematic observation is at the mean of good observations.

monitor

if TRUE verbose output.

Details

POEM assumes that an multivariate outlier detection has been carried out beforehand and assumes the result is summarized in the vector outind. In addition, further observations may have been flagged as failing edit-rules and this information is given in the vector errors. The mean and covariance estimate is calculated with the good observations (no outliers and downweighted errors). Preliminary mean imputation is sometimes needed to avoid a non-positive definite covariance estimate at this stage. Preliminary mean imputation assumes that the problematic values of an observation (with errors, outliers or missing) can be replaced by the mean of the rest of the non-problematic observations. Note that the algorithm imputes these problematic observations afterwards and therefore the final covariance matrix with imputed data is not the same as the working covariance matrix (which may be based on preliminary mean imputation).

Value

POEM returns a list whose first component output is a sub-list with the following components:

preliminary.mean.imputation

Logical. TRUE if preliminary mean imputation should be used

completely.missing

Number of observations with no observed values

good.values

Weighted number of of good values (not missing, not outlying, not erroneous)

nonoutliers.before

Number of nonoutliers before reweighting

weighted.nonoutliers.before

Weighted number of nonoutliers before reweighting

nonoutliers.after

Number of nonoutliers after reweighting

weighted.nonoutliers.after

Weighted number of nonoutliers after reweighting

old.center

Coordinate means after weighting, before imputation

old.variances

Coordinate variances after weighting, before imputation

new.center

Coordinate means after weighting, after imputation

new.variances

Coordinate variances after weighting, after imputation

covariance

Covariance (of standardised observations) before imputation

imputed.observations

Indices of observations with imputed values

donors

Indices of donors for imputed observations

new.outind

Indices of new outliers

The further component returned by POEM is:

imputed.data

Imputed data set

Author(s)

Beat Hulliger

References

Béguin, C. and Hulliger B., (2002), EUREDIT Workpackage x.2 D4-5.2.1-2.C Develop and evaluate new methods for statistical outlier detection and outlier robust multivariate imputation, Technical report, EUREDIT 2002.

Examples

data(bushfirem, bushfire.weights)
outliers <- rep(0, nrow(bushfirem))
outliers[31:38] <- 1
imp.res <- POEM(bushfirem, bushfire.weights, outliers,
preliminary.mean.imputation = TRUE)
print(imp.res$output)
var(imp.res$imputed.data)

modi documentation built on March 31, 2023, 8:35 p.m.