ics.outlier: Outlier Detection Using ICS

View source: R/ics.outlier.R

ics.outlierR Documentation

Outlier Detection Using ICS


In a multivariate framework outlier(s) are detected using ICS. The function works on an object of class ics2 and decides automatically about the number of invariant components to use to search for the outliers and the number of outliers detected on these components. Currently the function is restricted to the case of searching outliers only on the first components.


ics.outlier(object, method = "norm.test", test = "agostino.test", mEig = 10000,
  level.test = 0.05, adjust = TRUE, level.dist = 0.025, mDist = 10000,
  type = "smallprop", ncores = NULL, iseed = NULL, pkg = "ICSOutlier", 
  qtype = 7, ...)



object of class ics2 where both S1 and S2 are specified as functions.


name of the method used to select the ICS components involved to compute ICS distances. Options are "norm.test" and "simulation". Depending on the method either comp.norm.test or comp.simu.test are used.


name of the marginal normality test to use if method = "norm.test". Possibilities are "jarque.test", "anscombe.test", "bonett.test", "agostino.test", "shapiro.test". Default is "agostino.test".


number of simulations performed to derive the cut-off values for selecting the ICS components. Only if method = "simulation". See comp.simu.test for details.


level for the comp.norm.test or comp.simu.test functions. The inital level for selecting the invariant coordinates.


logical. For selecting the invariant coordinates, the level of the test can be adjusted for each component to deal with multiple testing. See comp.norm.test and comp.simu.test for details. Default is TRUE.


level for the dist.simu.test function. The (1-level)th quantile used to determine the cut-off value for the ICS distances.


number of simulations performed to derive the cut-off value for the ICS distances. See dist.simu.test for details.


currently the only option is "smallprop" which means that only the first ICS components can be selected. See comp.norm.test or comp.simu.test for details.


number of cores to be used in dist.simu.test and comp.simu.test. If NULL or 1, no parallel computing is used. Otherwise makeCluster with type = "PSOCK" is used.


If parallel computation is used the seed passed on to clusterSetRNGStream. Default is NULL which means no fixed seed is used.


When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via require. Must be at least "ICSOutlier" and must contain the packages needed to compute the scatter matrices.


specifies the quantile algorithm used in quantile.


passed on to other methods.


The ICS method has attractive properties for outlier detection in the case of a small proportion of outliers. As for PCA three steps have to be performed: (i) select the components most useful for the detection, (ii) compute distances as outlierness measures for all observation and finally (iii) label outliers using some cut-off value.

This function performs these three steps automatically:


For choosing the components of interest two methods are proposed: "norm.test" based on some marginal normality tests (see details in comp.norm.test) or "simulation" based on a parallel analysis (see details in comp.simu.test). These two approaches lie on the intrinsic property of ICS in case of a small proportion of outliers with the choice of S1 "more robust" than S2, which ensures to find outliers on the first components. Indeed when using S1 = MeanCov and S2 = Mean3Cov4, the Invariant Coordinates are ordered according to their classical Pearson kurtosis values in decreasing order. The information to find the outliers should be then contained in the first k nonnormal directions.


Then the ICS distances are computed as the Euclidian distances on the selected k centered components Z_k.


Finally the outliers are identified based on a cut-off derived from simulations. If the distance of an observation exceeds the expectation under the normal model, this observation is labeled as outlier (see details in dist.simu.test).

As a rule of thumb, the percentage of contamination should be limited to 10% in case of a mixture of gaussian distributions and using the default combination of locations and scatters for ICS.


an object of class icsOut


Function ics.outlier reached the end of its lifecycle, please use ICS_outlier instead. In future versions, ics.outlier will be deprecated and eventually removed.


Aurore Archimbaud and Klaus Nordhausen


Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. <https://doi.org/10.1016/j.csda.2018.06.011>.

Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure. The R Journal, 10:234-250. <doi:10.32614/RJ-2018-034>.

See Also

ics2, comp.norm.test, comp.simu.test, dist.simu.test, icsOut-class


# ReliabilityData example: the observations 414 and 512 are suspected to be outliers  
icsReliabilityData <- ics2(ReliabilityData, S1 = tM, S2 = MeanCov)
# For demo purpose only small mDist value, but as extreme quantiles 
# are of interest mDist should be much larger. Also number of cores used
# should be larger if available
icsOutlierDA <- ics.outlier(icsReliabilityData, level.dist = 0.01, mDist = 50, ncores = 1)

## Not run: 
# For using several cores and for using a scatter function from a different package
# Using the parallel package to detect automatically the number of cores
# ICS with MCD estimates and the usual estimates
# Need to create a wrapper for the CovMcd function to return first the location estimate
# and the scatter estimate secondly.
myMCD <- function(x,...){
  mcd <- CovMcd(x,...)
  return(list(location = mcd@center, scatter = mcd@cov))
icsHTP <- ics2(HTP, S1 = myMCD, S2 = MeanCov, S1args = list(alpha = 0.75))
# For demo purpose only small m value, should select the first seven components
icsOutlier <- ics.outlier(icsHTP, mEig = 50, level.test = 0.05, adjust = TRUE, 
                          level.dist = 0.025, mDist = 50,
                          ncores =  detectCores()-1, iseed = 123, 
                          pkg = c("ICSOutlier", "rrcov"))

## End(Not run)
# Exemple of no direction and hence also no outlier
X = rmvnorm(500, rep(0, 2), diag(rep(0.1,2)))
icsX <- ics2(X)
icsOutlierJB <- ics.outlier(icsX, test = "jarque", level.dist = 0.01, 
				level.test = 0.01, mDist = 100, ncores = 1)


# Example of no outlier
X = matrix(rweibull(1000, 4, 4), 500, 2)
X = apply(X,2, function(x){ifelse(x<5 & x>2, x, runif(sum(!(x<5 & x>2)), 5, 5.5))}) 
icsX <- ics2(X)
icsOutlierAG <- ics.outlier(icsX, test = "anscombe", level.dist = 0.01, 
				level.test = 0.05, mDist = 100, ncores = 1)

ICSOutlier documentation built on May 29, 2024, 2:08 a.m.