View source: R/dist.simu.test.R

dist_simu_test | R Documentation |

Computes the cut-off values for the identification of the outliers based on the squared ICS distances. It uses simulations under a multivariate standard normal model for a specific data setup and scatters combination.

```
dist_simu_test(
object,
S1 = NULL,
S2 = NULL,
S1_args = list(),
S2_args = list(),
index,
m = 10000,
level = 0.025,
n_cores = NULL,
iseed = NULL,
pkg = "ICSOutlier",
q_type = 7,
...
)
```

`object` |
object of class |

`S1` |
an object of class |

`S2` |
an object of class |

`S1_args` |
a list containing additional arguments for |

`S2_args` |
a list containing additional arguments for |

`index` |
integer vector specifying which components are used to compute the |

`m` |
number of simulations. Note that extreme quantiles are of interest and hence |

`level` |
the (1- |

`n_cores` |
number of cores to be used. If |

`iseed` |
If parallel computation is used the seed passed on to |

`pkg` |
When using parallel computing, a character vector listing all the packages which need to be loaded on the different cores via |

`q_type` |
specifies the quantile algorithm used in |

`...` |
further arguments passed on to the function |

The function extracts basically the dimension of the data from the `"ICS"`

object and simulates `m`

times, from a multivariate standard normal distribution, the squared ICS distances with the components specified in `index`

. The resulting value is then the mean of the `m`

correponding quantiles of these distances at level 1-`level`

.

Note that depending on the data size and scatters used this can take a while and so it is more efficient to parallelize computations.

Note that the function is seldomly called directly by the user but internally by `ICS_outlier()`

.

A vector with the values of the (1-`level`

)th quantile.

Aurore Archimbaud and Klaus Nordhausen

Archimbaud, A., Nordhausen, K. and Ruiz-Gazen, A. (2018), ICS for multivariate outlier detection with application to quality control. Computational Statistics & Data Analysis, 128:184-199. ISSN 0167-9473. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2018.06.011")}.

ICS(), `ics_distances()`

```
# For a real analysis use larger values for m and more cores if available
Z <- rmvnorm(1000, rep(0, 6))
Z[1:20, 1] <- Z[1:20, 1] + 10
A <- matrix(rnorm(36), ncol = 6)
X <- tcrossprod(Z, A)
pairs(X)
icsX <- ICS(X, center = TRUE)
icsX.dist.1 <- ics_distances(icsX, index = 1)
CutOff <- dist_simu_test(icsX, S1 = ICS_cov, S2= ICS_cov4,
index = 1, m = 500, ncores = 1)
# check if outliers are above the cut-off value
plot(icsX.dist.1, col = rep(2:1, c(20, 980)))
abline(h = CutOff)
library(REPPlab)
data(ReliabilityData)
# The observations 414 and 512 are suspected to be outliers
icsReliability <- ICS(ReliabilityData, center = TRUE)
# Choice of the number of components with the screeplot: 2
screeplot(icsReliability)
# Computation of the distances with the first 2 components
ics.dist.scree <- ics_distances(icsReliability, index = 1:2)
# Computation of the cut-off of the distances
CutOff <- dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4,
index = 1:2, m = 50, level = 0.02, ncores = 1)
# Identification of the outliers based on the cut-off value
plot(ics.dist.scree)
abline(h = CutOff)
outliers <- which(ics.dist.scree >= CutOff)
text(outliers, ics.dist.scree[outliers], outliers, pos = 2, cex = 0.9)
## Not run:
# For using three cores
# For demo purpose only small m value, should select the first #' component
dist_simu_test(icsReliability, S1 = ICS_cov, S2= ICS_cov4,
index = 1:2, m = 500, level = 0.02, n_cores = 3, iseed #' = 123)
# For using several cores and for using a scatter function from a different package
# Using the parallel package to detect automatically the number of cores
library(parallel)
# ICS with Cauchy estimates
library(ICSClust)
icsReliabilityMLC <- ICS(ReliabilityData, S1 = ICS_mlc,
S1_args = list(location = TRUE),
S2 = ICS_cov, center = TRUE)
# Computation of the cut-off of the distances. For demo purpose only small m value.
dist_simu_test(icsReliabilityMLC, S1 = ICS_mlc, S1_args = list(location = TRUE),
S2 = ICS_cov, index = 1:2, m = 500, level = 0.02,
n_cores = detectCores()-1, pkg = c("ICSOutlier","ICSClust"), iseed = 123)
## End(Not run)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.