Epidemic Algorithm for imputation of multivariate outliers in incomplete survey data.

Share:

Description

After running EAdet an imputation of the detected outliers with EAimp may be run.

Usage

1
2
3
4
5
6
7
EAimp(data, weights ,  outind, reach="max",      transmission.function = "root", 

power=ncol(data), distance.type = "euclidean", 

duration = 5, maxl = 5,
kdon = 1, monitor = FALSE, threshold = FALSE,
deterministic = TRUE, fixedprop = 0)

Arguments

data

a data frame or matrix with the data

weights

a vector of positive sampling weights

outind

a logical vecotr with component TRUE for outliers

reach

reach of the threshold function (usually set to the maximum distance to a nearest neighbour, see internal function .EA.dist)

transmission.function

form of the transmission function of distance d: "step" is a heaviside function which jumps to 1 at d0, "linear" is linear between 0 and d0, "power" is (beta*d+1)^(-p) for p=ncol(data) as default, "root" is the function 1-(1-d/d0)^(1/maxl)

power

sets p=power, where p is the parameter in the above transmission function.

distance.type

distance type in function dist()

maxl

Maximum number of steps without infection

monitor

if TRUE verbose output on epidemic

threshold

Infect all remaining points with infection probability above the threshold 1-0.5^(1/maxl)

deterministic

if TRUE the number of infections is the expected number and the infected observations are the ones with largest infection probabilities.

duration

The duration of the detection epidemic

kdon

The number of donors that should be infected before imputation

fixedprop

If TRUE a fixed proportion of observations is infected at each step

Details

EAimp uses the distances calculated in EAdet (actually the counterprobabilities, which are stored in a global data set) and starts an epidemic at each observation to be imputed until donors for the missing values are infected. Then a donor is selected randomly.

Value

EAimp returns a list with components parameters and imputed.data.

parameters contains the following components:

sample.size

Number of observations

number.of.variables

Number of variables

n.complete.records

Number of records without missing values

n.usable.records

Number of records with less than half of values missing (unusable observations are discarded)

duration

Duration of epidemic

reach

Transmission distance (d0)

threshold

Input parameter

deterministic

Input parameter

computation.time

Elapsed computation time

imputed.data contains the imputed data.

Author(s)

Beat Hulliger

References

B\'eguin, C., and Hulliger, B. (2004). Multivariate oulier detection in incomplete survey data: The epidemic algorithm and transformed rank correlations. Journal of the Royal Statistical Society, A 167(Part 2.), 275-294.

See Also

EAdet for outlier detection with the Epicemic Algorithm.

Examples

1
2
3
4
5
data(bushfirem,bushfire.weights)
det.res<-EAdet(bushfirem,bushfire.weights)
imp.res<-EAimp(bushfirem,bushfire.weights,outind=det.res$outind,
reach=det.res$output$max.min.di,kdon=3)
print(imp.res$output)