outstah | R Documentation |
Functions calculating outlyingness for the data observations (= rows of a matrix X
). Outlyingness quantifies (in a relative scale) how far is an observation from the bulk (center) of the data. Such a measure can be used for detecting outliers and/or weighting the row observatiosn in robust regression methods.
- outstah
:
The function computes the Stahel-Donoho outlyingness (Maronna & Yohai 1995, Hubert et al. 2005, Daszykowski et al. 2007).
Outlyingness is calculated from the projections of the observation to a set of directions. The set of directions consists in the n
directions corresponding to the rows of X
, eventually completed by a number of nsim
directions. In the function, the nsim
directions are simulated as proposed in Hubert et al. (2005): random couples of observations are sampled in matrix X
and, for each couple, the simulated direction is the one passing through the two observations of the couple (see functions .simpp.hub
in file zfunctions.R
).
- outeucl
:
Outlyingness is calculated by the Euclidean distance between the observation and a robust estimate of the center of the data (either the column-wise median or the spatial median). The euclidean distance is then scaled by the median of the n
calculated Euclidean distances. Such outlyingness was for instance used in the robust PLSR (PRM) algorithm of Serneels et al. 2005.
- outsdod
:
Outlyingness is calculated from a fitted score space. For instance, a PCA (or PLS) is preliminary fitted with a given algorithm (ideally robust). Then, score (SD) and orthogonal (OD) distances are calculated for the fitted score space and standardized by cutoffs (see scordis
and odis
). The outlyingness is then computed by sqrt(.5 * SD_stand^2 + .5 * OD_stand^2)
.
outstah(X, scale = TRUE, nsim = 1500)
outeucl(X, scale = TRUE, spatial = FALSE)
outsdod(fm, X,
ncomp = NULL,
robust = FALSE, alpha = .01)
X |
A |
scale |
If |
nsim |
For |
spatial |
For |
fm |
For |
ncomp |
For |
robust |
For |
alpha |
For |
A vector of outlyningness (length n
).
Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B., 2007. Robust statistics in data analysis - A review. Chemometrics and Intelligent Laboratory Systems 85, 203-219. https://doi.org/10.1016/j.chemolab.2006.06.016
Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C., 2015. Sparse partial robust M regression. Chemometrics and Intelligent Laboratory Systems 149, 50-59. https://doi.org/10.1016/j.chemolab.2015.09.019
Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563
Maronna, R.A., Yohai, V.J., 1995. The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90, 330-341. https://doi.org/10.1080/01621459.1995.10476517
Serneels, S., Croux, C., Filzmoser, P., Van Espen, P.J., 2005. Partial robust M-regression. Chemometrics and Intelligent Laboratory Systems 79, 55-64. https://doi.org/10.1016/j.chemolab.2005.04.007
n <- 6
p <- 4
set.seed(1)
X <- matrix(rnorm(n * p, mean = 10), ncol = p, byrow = TRUE)
set.seed(NULL)
X
outstah(X)
outeucl(X)
fm <- pca_rob(X, ncomp = 2)
outsdod(fm, X)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.