EAdet | R Documentation |
In EAdet
an epidemic is started at a center of the data. The epidemic
spreads out and infects neighbouring points (probabilistically or deterministically).
The last points infected are outliers. After running EAdet
an imputation
with EAimp
may be run.
EAdet(
data,
weights,
reach = "max",
transmission.function = "root",
power = ncol(data),
distance.type = "euclidean",
maxl = 5,
plotting = TRUE,
monitor = FALSE,
prob.quantile = 0.9,
random.start = FALSE,
fix.start,
threshold = FALSE,
deterministic = TRUE,
rm.missobs = FALSE,
verbose = FALSE
)
data |
a data frame or matrix with data. |
weights |
a vector of positive sampling weights. |
reach |
if |
transmission.function |
form of the transmission function of distance d:
|
power |
sets |
distance.type |
distance type in function |
maxl |
maximum number of steps without infection. |
plotting |
if |
monitor |
if |
prob.quantile |
if mads fail, take this quantile absolute deviation. |
random.start |
if |
fix.start |
force epidemic to start at a specific observation. |
threshold |
infect all remaining points with infection probability above
the threshold |
deterministic |
if |
rm.missobs |
set |
verbose |
more output with |
The form and parameters of the transmission function should be chosen such that the
infection times have at least a range of 10. The default cutting point to decide on
outliers is the median infection time plus three times the mad of infection times.
A better cutpoint may be chosen by visual inspection of the cdf of infection times.
EAdet
calls the function EA.dist
, which passes the counterprobabilities
of infection (a n * (n - 1) / 2
size vector!) and three parameters (sample
spatial median index, maximal distance to nearest neighbor and transmission distance =
reach) as arguments to EAdet
. The distances vector may be too large to be passed
as arguments. Then either the memory size must be increased. Former versions of the
code used a global variable to store the distances in order to save memory.
EAdet
returns a list whose first component output
is a sub-list
with the following components:
sample.size
Number of observations
discarded.observations
Indices of discarded observations
missing.observations
Indices of completely missing observations
number.of.variables
Number of variables
n.complete.records
Number of records without missing values
n.usable.records
Number of records with less than half of values missing (unusable observations are discarded)
medians
Component wise medians
mads
Component wise mads
prob.quantile
Use this quantile if mads fail, i.e. if one of the mads is 0
quantile.deviations
Quantile of absolute deviations
start
Starting observation
transmission.function
Input parameter
power
Input parameter
maxl
Maximum number of steps without infection
min.nn.dist
Maximal nearest neighbor distance
transmission.distance
d0
threshold
Input parameter
distance.type
Input parameter
deterministic
Input parameter
number.infected
Number of infected observations
cutpoint
Cutpoint of infection times for outlier definition
number.outliers
Number of outliers
outliers
Indices of outliers
duration
Duration of epidemic
computation.time
Elapsed computation time
initialisation.computation.time
Elapsed computation time for standardisation and calculation of distance matrix
The further components returned by EAdet
are:
infected
Indicator of infection
infection.time
Time of infection
outind
Indicator of outliers
Beat Hulliger
Béguin, C. and Hulliger, B. (2004) Multivariate outlier detection in incomplete survey data: the epidemic algorithm and transformed rank correlations, JRSS-A, 167, Part 2, pp. 275-294.
EAimp
for imputation with the Epidemic Algorithm.
data(bushfirem, bushfire.weights)
det.res <- EAdet(bushfirem, bushfire.weights)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.