EAimp: Epidemic Algorithm for imputation of multivariate outliers in...

View source: R/EAimp.R

EAimpR Documentation

Epidemic Algorithm for imputation of multivariate outliers in incomplete survey data.

Description

After running EAdet an imputation of the detected outliers with EAimp may be run.

Usage

EAimp(
  data,
  weights,
  outind,
  reach = "max",
  transmission.function = "root",
  power = ncol(data),
  distance.type = "euclidean",
  duration = 5,
  maxl = 5,
  kdon = 1,
  monitor = FALSE,
  threshold = FALSE,
  deterministic = TRUE,
  fixedprop = 0
)

Arguments

data

a data frame or matrix with the data.

weights

a vector of positive sampling weights.

outind

a logical vector with component TRUE for outliers.

reach

reach of the threshold function (usually set to the maximum distance to a nearest neighbour, see internal function EA.dist).

transmission.function

form of the transmission function of distance d: "step" is a heaviside function which jumps to 1 at d0, "linear" is linear between 0 and d0, "power" is beta*d+1^(-p) for p=ncol(data) as default, "root" is the function 1-(1-d/d0)^(1/maxl).

power

sets p=power, where p is the parameter in the above transmission function.

distance.type

distance type in function dist().

duration

the duration of the detection epidemic.

maxl

maximum number of steps without infection.

kdon

the number of donors that should be infected before imputation.

monitor

if TRUE verbose output on epidemic.

threshold

Infect all remaining points with infection probability above the threshold 1-0.5^(1/maxl).

deterministic

if TRUE the number of infections is the expected number and the infected observations are the ones with largest infection probabilities.

fixedprop

if TRUE a fixed proportion of observations is infected at each step.

Details

EAimp uses the distances calculated in EAdet (actually the counterprobabilities, which are stored in a global data set) and starts an epidemic at each observation to be imputed until donors for the missing values are infected. Then a donor is selected randomly.

Value

EAimp returns a list with two components: parameters and imputed.data. parameters contains the following elements:

sample.size

Number of observations

number.of.variables

Number of variables

n.complete.records

Number of records without missing values

n.usable.records

Number of records with less than half of values missing (unusable observations are discarded)

duration

Duration of epidemic

reach

Transmission distance (d0)

threshold

Input parameter

deterministic

Input parameter

computation.time

Elapsed computation time

imputed.data contains the imputed data.

Author(s)

Beat Hulliger

References

Béguin, C. and Hulliger, B. (2004) Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations, JRSS-A, 167, Part 2, pp. 275-294.

See Also

EAdet for outlier detection with the Epidemic Algorithm.

Examples

data(bushfirem, bushfire.weights)
det.res <- EAdet(bushfirem, bushfire.weights)
imp.res <- EAimp(bushfirem, bushfire.weights, outind = det.res$outind, kdon = 3)
print(imp.res$output)

modi documentation built on March 31, 2023, 8:35 p.m.