pca_rob | R Documentation |
Algorithms implementing a robust PCA of a matrix X
.
- pca_cr
:
Croux & Ruiz-Gazen (C-R) PCA algorithm using projection pursuit (PP) method (Croux & Ruiz-Gazen 2005). The observations are robustly centered, and projected to specific "PP" directions (see below) of the space spaned by the variables (X
columns). The PCA loading vectors are the directions (within the PP directions) that maximize a given "projection index", usually a robust spread measure such as MAD. The 1st loading vector is choosen within the n
directions corresponding the observations (rows of X
). The next loading vector is choosen in n
directions corresponding the rows of the deflated matrix X
). And so on.
A possible extension of the algorithm is to randomly simulate additionnal candidate PP directions to the n
row observations. In function pca_cr
, this is done when argument nsim > 0
. In such a case, the function simulates nsim
additional PP directions to the n
initial ones, as proposed in Hubert et al. (2005): random couples of observations are sampled in X
and, for each couple, the direction passes through the two observations of the couple (see functions .simpp.hub
in file zfunctions.R
).
- pca_sph
:
Spherical PCA (see Locantore et al. 1990, Maronna 2005, Daszykowski et al. 2007). The spatial median used for centering matrix X
is calculated by function .xmedspa
(available in file zfunctions.R
). This function uses the fast code of rrcov v.1.4-3 available on R CRAN (Thanks to V. Todorov, 2016).
- pca_rob
:
Robust PCA combining outlyingness measures and weighted PCA (WPCA). Rows of matrix X
receive weights depending on outlyingness. The algorithm is in the same spirit (but quite different) as the ROBPCA method (Hubert et al. 2005, 2009). It is built in the two following steps. The second step is facultative (argument step2
).
Step 1 intends to detect and remove multivariate X-
outliers that have potentially bad leverages. First, the Stahel-Donoho outlyingness (Maronna and Yohai, 1995) is computed for each of the rows of matrix X
(see outstah
). Noting 1 - p.rm
a proportion (by default in the function, p.rm = .30
, as in Hubert et al. 2005, 2009), the p.rm * n
observations that have the highest outlyingness receive a weight w1 = 0
(the other receive a weight w1 = 1
). Then, a robust Euclidean outlyingness (see outeucl
) is calculated. The (p.rm * n
observations that have the highest outlyingness receive a weight w2 = 0
(the other receive a weight w2 = 1
). The final weight for Step 1 is w = w1 * w2
), which is used for a first weighted PCA.
Step2 (facultative) intends to detect outliers in the robust score space computed in Step 1. The score and orthogonal distances, standardized by given cutoffs (see scordis
and odis
), of each of the n
observations are calculated from this score space. An outlyingness is computed by (.5 * SD.stand^2 + .5 * OD.stand^2)^.5
. This is implemented using outsdod
. Outlyingness values higher than 1 receive a weight w = 0
(the other receive a weight w = 1
), and a new weighted PCA is fitted.
In all the functions, matrix X
is centered before the analyses, but X
is not column-wise scaled (there is no argument scale
available). If a scaling is needed, the user has to scale X
before using the functions.
pca_cr(X, ncomp, obj = mad, nsim = 0)
pca_sph(X, ncomp, weights = NULL)
pca_rob(X, ncomp,
nsim = 1500, p.rm = .30,
step2 = TRUE, weights = NULL,
robust = TRUE, alpha = 0.01)
X |
A |
ncomp |
The maximal number of PCA scores (= components = latent variables) to be calculated. |
obj |
For |
nsim |
For |
weights |
For |
p.rm |
For |
step2 |
For |
robust |
For |
alpha |
For |
A list of outputs, such as:
T |
The score matrix ( |
P |
The loadings matrix ( |
R |
The projection matrix (= |
sv |
The singular values (vector of length |
eigs |
The eigenvalues ( |
xmeans |
The centering vector of |
Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B., 2007. Robust statistics in data analysis - A review. Chemometrics and Intelligent Laboratory Systems 85, 203-219. https://doi.org/10.1016/j.chemolab.2006.06.016
Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563
Hubert, M., Rousseeuw, P., Verdonck, T., 2009. Robust PCA for skewed data and its outlier map. Computational Statistics & Data Analysis 53, 2264-2274. https://doi.org/10.1016/j.csda.2008.05.027
N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, K.L. Cohen, Robust principal component analysis for functional data, Test 8 (1999) 1â7
Maronna, R.A., Yohai, V.J., 1995. The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90, 330-341. https://doi.org/10.1080/01621459.1995.10476517
Maronna, R., 2005. Principal Components and Orthogonal Regression Based on Robust Scales, Technometrics, 47:3, 264-273, DOI: 10.1198/004017005000000166
data(datoctane)
## 6 samples (25, 26, and 36â39) are outliers
## (contain added alcohol)
## See Hubert et al. 2005
X <- datoctane$X
plotsp(X)
ncomp <- 2
fm <- pca_cr(X, ncomp = ncomp)
#fm <- pca_sph(X, ncomp = ncomp)
#fm <- pca_rob(X, ncomp = ncomp)
#fm <- pca_eigen(X, ncomp = ncomp)
z <- scordis(fm, robust = TRUE)
sd <- z$dr$d
cutsd <- z$cutoff
z <- odis(fm, X, robust = TRUE)
od <- z$dr$d
cutod <- z$cutoff
plot(sd, od, pch = 16,
xlab = "SD", ylab = "OD")
abline(h = cutod, v = cutsd, lty = 2)
u <- c(25, 26, 36:39)
points(sd[u], od[u], col = "green", cex = 1.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.