Description Usage Arguments Value Author(s) References See Also Examples
The ADALARA algorithm is based on the CLARA clustering algorithm. This is the parallel version of the algorithm to try to get faster results. It allows to detect anomalies (outliers). There are two different methods to detect them: the adjusted boxplot (default and most reliable option) and tolerance intervals. If needed, tolerance intervals allow to define a degree of outlierness.
1 2 3 4 |
data |
Data matrix. Each row corresponds to an observation and each column corresponds to a variable. All variables are numeric. The data must have row names so that the algorithm can identify the archetypoids in every sample. |
N |
Number of samples. |
m |
Sample size of each sample. |
numArchoid |
Number of archetypes/archetypoids. |
numRep |
For each |
huge |
Penalization added to solve the convex least squares problems. |
prob |
Probability with values in [0,1]. |
type_alg |
String. Options are 'ada' for the non-robust adalara algorithm and 'ada_rob' for the robust adalara algorithm. |
compare |
Boolean argument to compute the robust residual sum of squares
if |
vect_tol |
Vector with the tolerance values. Default c(0.95, 0.9, 0.85).
Needed if |
alpha |
Significance level. Default 0.05. Needed if |
outl_degree |
Type of outlier to identify the degree of outlierness.
Default c("outl_strong", "outl_semi_strong", "outl_moderate").
Needed if |
method |
Method to compute the outliers. Options allowed are 'adjbox' for using adjusted boxplots for skewed distributions, and 'toler' for using tolerance intervals. |
frame |
Boolean value to indicate whether the frame is computed (Mair et al., 2017) or not. The frame is made up of a subset of extreme points, so the archetypoids are only computed on the frame. Low frame densities are obtained when only small portions of the data were extreme. However, high frame densities reduce this speed-up. |
A list with the following elements:
cases Optimal vector of archetypoids.
rss Optimal residual sum of squares.
outliers: Outliers.
Guillermo Vinue, Irene Epifanio
Eugster, M.J.A. and Leisch, F., From Spider-Man to Hero - Archetypal Analysis in R, 2009. Journal of Statistical Software 30(8), 1-23, https://doi.org/10.18637/jss.v030.i08
Hubert, M. and Vandervieren, E., An adjusted boxplot for skewed distributions, 2008. Computational Statistics and Data Analysis 52(12), 5186-5201, https://doi.org/10.1016/j.csda.2007.11.008
Kaufman, L. and Rousseeuw, P.J., Clustering Large Data Sets, 1986. Pattern Recognition in Practice, 425-437.
Mair, S., Boubekki, A. and Brefeld, U., Frame-based Data Factorizations, 2017. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 1-9.
Moliner, J. and Epifanio, I., Robust multivariate and functional archetypal analysis with application to financial time series analysis, 2019. Physica A: Statistical Mechanics and its Applications 519, 195-208. https://doi.org/10.1016/j.physa.2018.12.036
Vinue, G., Anthropometry: An R Package for Analysis of Anthropometric Data, 2017. Journal of Statistical Software 77(6), 1-39, https://doi.org/10.18637/jss.v077.i06
do_ada
, do_ada_robust
, adalara_no_paral
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ## Not run:
library(Anthropometry)
library(doParallel)
# Prepare parallelization (including the seed for reproducibility):
no_cores <- detectCores() - 1
cl <- makeCluster(no_cores)
registerDoParallel(cl)
clusterSetRNGStream(cl, iseed = 1)
# Load data:
data(mtcars)
data <- mtcars
n <- nrow(data)
# Arguments for the archetype/archetypoid algorithm:
# Number of archetypoids:
k <- 3
numRep <- 2
huge <- 200
# Size of the random sample of observations:
m <- 10
# Number of samples:
N <- floor(1 + (n - m)/(m - k))
N
prob <- 0.75
# ADALARA algorithm:
preproc <- preprocessing(data, stand = TRUE, percAccomm = 1)
data1 <- as.data.frame(preproc$data)
adalara_aux <- adalara(data1, N, m, k, numRep, huge, prob,
"ada_rob", FALSE, method = "adjbox", frame = FALSE)
#adalara_aux <- adalara(data1, N, m, k, numRep, huge, prob,
# "ada_rob", FALSE, vect_tol = c(0.95, 0.9, 0.85), alpha = 0.05,
# outl_degree = c("outl_strong", "outl_semi_strong", "outl_moderate"),
# method = "toler", frame = FALSE)
# Take the minimum RSS, which is in the second position of every sublist:
adalara <- adalara_aux[which.min(unlist(sapply(adalara_aux, function(x) x[2])))][[1]]
adalara
# End parallelization:
stopCluster(cl)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.