ICS_outlier  R Documentation 
In a multivariate framework outlier(s) are detected using ICS. The function performs ICS() and decides automatically about the number of invariant components to use to search for the outliers and the number of outliers detected on these components. Currently the function is restricted to the case of searching outliers only on the first components.
ICS_outlier(
X,
S1 = ICS_cov,
S2 = ICS_cov4,
S1_args = list(),
S2_args = list(),
ICS_algorithm = c("whiten", "standard", "QR"),
method = "norm_test",
test = "agostino.test",
n_eig = 10000,
level_test = 0.05,
adjust = TRUE,
level_dist = 0.025,
n_dist = 10000,
type = "smallprop",
n_cores = NULL,
iseed = NULL,
pkg = "ICSOutlier",
q_type = 7,
...
)
X 
a numeric matrix or data frame containing the data to be transformed. 
S1 
an object of class 
S2 
an object of class 
S1_args 
a list containing additional arguments for 
S2_args 
a list containing additional arguments for 
ICS_algorithm 
a character string specifying with which algorithm
the invariant coordinate system is computed. Possible values are

method 
name of the method used to select the ICS components involved to compute ICS distances. Options are 
test 
name of the marginal normality test to use if 
n_eig 
number of simulations performed to derive the cutoff values for selecting the ICS components. Only if 
level_test 
for the 
adjust 
logical. For selecting the invariant coordinates, the level of the test can be adjusted for each component to deal with multiple testing. See 
level_dist 

n_dist 
number of simulations performed to derive the cutoff value for the ICS distances. See 
type 
currently the only option is 
n_cores 
number of cores to be used in 
iseed 
If parallel computation is used the seed passed on to 
pkg 
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via 
q_type 
specifies the quantile algorithm used in 
... 
passed on to other methods. 
The ICS method has attractive properties for outlier detection in the case of a small proportion of outliers. As for PCA three steps have to be performed:(i) select the components most useful for the detection, (ii) compute distances as outlierness measures for all observation and finally (iii) label outliers using some cutoff value.
This function performs these three steps automatically:
For choosing the components of interest two methods are proposed: "norm_test"
based on some marginal normality tests (see details in comp_norm_test
) or "simulation"
based on a parallel analysis (see details in comp_simu_test
). These two approaches lie on the intrinsic property of ICS in case of a small proportion of outliers with the choice of S1 "more robust" than S2, which ensures to find outliers on the first components. Indeed when using S1 = ICS_cov
and S2 = ICS_cov4
, the Invariant Coordinates are ordered according to their classical Pearson kurtosis values in decreasing order. The information to find the outliers should be then contained in the first k nonnormal directions.
Then the ICS distances are computed as the Euclidean distances on the selected k centered components Z_k
.
Finally the outliers are identified based on a cutoff derived from simulations. If the distance of an observation exceeds the expectation under the normal model, this observation is labeled as outlier (see details in dist_simu_test
).
As a rule of thumb, the percentage of contamination should be limited to 10% in case of a mixture of gaussian distributions and using the default combination of locations and scatters for ICS.
An object of S3class 'ICS_Out' which contains:
outliers
: A vector containing ones for outliers and zeros for non outliers.
ics_distances
: A numeric vector containing the squared ICS distances.
ics_dist_cutoff
: The cutoff for the distances to decide if an observation is outlying or not.
level_dist
: The level for deciding upon the cutoff value for the ICS distances.
level_test
: The initial level for selecting the invariant coordinates.
method
: Name of the method used to decide upon the number of ICS components.
index
: Vector giving the indices of the ICS components selected.
test
: The name of the normality test as specified in the function call.
criterion
: Vector giving the marginal levels for the components selection.
adjust
: Wether the initial level used to decide upon the number of components has been adjusted for multiple testing or not.
type
: Currently always the string "smallprop"
.
n_dist
: Number of simulations performed to decide upon the cutoff for the ICS distances.
n_eig
: Number of simulations performed for selecting the ICS components based on simulations.
S1_label
: Name of S1.
S2_label
: Name of S2.
Aurore Archimbaud and Klaus Nordhausen
Archimbaud, A., Nordhausen, K. and RuizGazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184199. ISSN 01679473. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2018.06.011")}.
ICS(), comp_norm_test()
, comp_simu_test()
,
dist_simu_test()
and
print(), plot(), summary() methods
# ReliabilityData example: the observations 414 and 512 are suspected to be outliers
library(REPPlab)
data(ReliabilityData)
# For demo purpose only small mDist value, but as extreme quantiles
# are of interest mDist should be much larger. Also number of cores used
# should be larger if available
icsOutlierDA < ICS_outlier(ReliabilityData, S1 = ICS_tM, S2 = ICS_cov,
level_dist = 0.01, n_dist = 50, n_cores = 1)
icsOutlierDA
summary(icsOutlierDA)
plot(icsOutlierDA)
## Not run:
# For using several cores and for using a scatter function from a different package
# Using the parallel package to detect automatically the number of cores
library(parallel)
# ICS with MCD estimates and the usual estimates
# Need to create a wrapper for the CovMcd function to return first the location estimate
# and the scatter estimate secondly.
data(HTP)
library(ICSClust)
# For demo purpose only small m value, should select the first seven components
icsOutlier < ICS_outlier(HTP, S1 = ICS_mcd_rwt, S2 = ICS_cov,
S1_args = list(location = TRUE, alpha = 0.75),
n_eig = 50, level_test = 0.05, adjust = TRUE,
level_dist = 0.025, n_dist = 50,
n_cores = detectCores()1, iseed = 123,
pkg = c("ICSOutlier", "ICSClust"))
icsOutlier
## End(Not run)
# Exemple of no direction and hence also no outlier
set.seed(123)
X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2)))
icsOutlierJB < ICS_outlier(X, test = "jarque.test", level_dist = 0.01,
level_test = 0.01, n_dist = 100, n_cores = 1)
summary(icsOutlierJB)
plot(icsOutlierJB)
rm(.Random.seed)
# Example of no outlier
set.seed(123)
X = matrix(rweibull(1000, 4, 4), 500, 2)
X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))})
icsOutlierAG < ICS_outlier(X, test = "anscombe.test", level_dist = 0.01,
level_test = 0.05, n_dist = 100, n_cores = 1)
summary(icsOutlierAG)
plot(icsOutlierAG)
rm(.Random.seed)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.