pca_rob: Robust PCA algorithms
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

pca_rob

R Documentation

Robust PCA algorithms

Description

Algorithms implementing a robust PCA of a matrix X.

- pca_cr:

Croux & Ruiz-Gazen (C-R) PCA algorithm using projection pursuit (PP) method (Croux & Ruiz-Gazen 2005). The observations are robustly centered, and projected to specific "PP" directions (see below) of the space spaned by the variables (X columns). The PCA loading vectors are the directions (within the PP directions) that maximize a given "projection index", usually a robust spread measure such as MAD. The 1st loading vector is choosen within the n directions corresponding the observations (rows of X). The next loading vector is choosen in n directions corresponding the rows of the deflated matrix X). And so on.

A possible extension of the algorithm is to randomly simulate additionnal candidate PP directions to the n row observations. In function pca_cr, this is done when argument nsim > 0. In such a case, the function simulates nsim additional PP directions to the n initial ones, as proposed in Hubert et al. (2005): random couples of observations are sampled in X and, for each couple, the direction passes through the two observations of the couple (see functions .simpp.hub in file zfunctions.R).

- pca_sph:

Spherical PCA (see Locantore et al. 1990, Maronna 2005, Daszykowski et al. 2007). The spatial median used for centering matrix X is calculated by function .xmedspa (available in file zfunctions.R). This function uses the fast code of rrcov v.1.4-3 available on R CRAN (Thanks to V. Todorov, 2016).

- pca_rob:

Robust PCA combining outlyingness measures and weighted PCA (WPCA). Rows of matrix X receive weights depending on outlyingness. The algorithm is in the same spirit (but quite different) as the ROBPCA method (Hubert et al. 2005, 2009). It is built in the two following steps. The second step is facultative (argument step2).

Step 1 intends to detect and remove multivariate X-outliers that have potentially bad leverages. First, the Stahel-Donoho outlyingness (Maronna and Yohai, 1995) is computed for each of the rows of matrix X (see outstah). Noting 1 - p.rm a proportion (by default in the function, p.rm = .30, as in Hubert et al. 2005, 2009), the p.rm * n observations that have the highest outlyingness receive a weight w1 = 0 (the other receive a weight w1 = 1). Then, a robust Euclidean outlyingness (see outeucl) is calculated. The (p.rm * n observations that have the highest outlyingness receive a weight w2 = 0 (the other receive a weight w2 = 1). The final weight for Step 1 is w = w1 * w2), which is used for a first weighted PCA.

Step2 (facultative) intends to detect outliers in the robust score space computed in Step 1. The score and orthogonal distances, standardized by given cutoffs (see scordis and odis), of each of the n observations are calculated from this score space. An outlyingness is computed by (.5 * SD.stand^2 + .5 * OD.stand^2)^.5. This is implemented using outsdod. Outlyingness values higher than 1 receive a weight w = 0 (the other receive a weight w = 1), and a new weighted PCA is fitted.

In all the functions, matrix X is centered before the analyses, but X is not column-wise scaled (there is no argument scale available). If a scaling is needed, the user has to scale X before using the functions.

Usage


pca_cr(X, ncomp, obj = mad, nsim = 0)

pca_sph(X, ncomp, weights = NULL)

pca_rob(X, ncomp, 
    nsim = 1500, p.rm = .30, 
    step2 = TRUE, weights = NULL, 
    robust = TRUE, alpha = 0.01)

Arguments

`X`	A `n x p` matrix or data frame of variables.
`ncomp`	The maximal number of PCA scores (= components = latent variables) to be calculated.
`obj`	For `pca_cr`. The objective function (projection index) to maximize with the PP scores. Default to `obj = mad`.
`nsim`	For `pca_cr` and `pca_rob`. Used for the Stahel-Donoho outlyingness calculation (See `outstah`): a number of randomly simulated PP (projection pursuit) directions considered in addition to the `n` observations (rows of `X`).
`weights`	For `pca_sph` and `pca_rob`. A vector of length `n` defining a priori weights to apply to the observations. Internally, weights are "normalized" to sum to 1. Default to `NULL` (weights are set to `1 / n`).
`p.rm`	For `pca_rob`. Proportion `p.rm` of the data used as hard rejection (outliers) in Step 1 (default to `p.rm = .30`, i.e. 30pct are rejected).
`step2`	For `pca_rob`. If `TRUE` (default), Step 2 is implemented.
`robust`	For `pca_rob` ans step2 (used in `outsdod`). Logical. If `TRUE`, the moment estimation of the cutoff is robustified (this is advised in particular after robust PCA or PLS on small data sets containing strong outliers). Default to `FALSE`.
`alpha`	For `pca_rob` and step2 (used in `outsdod`). Risk I level for defining the cutoff detecting extreme values (see the code).

Value

A list of outputs, such as:

`T`	The score matrix (`n x ncomp`).
`P`	The loadings matrix (`p x ncomp`).
`R`	The projection matrix (= `P` ; `p x ncomp`).
`sv`	The singular values (vector of length `ncomp`).
`eigs`	The eigenvalues (`= sv^2`; vector of length `ncomp`).
`xmeans`	The centering vector of `X` (length `p`).

References

Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B., 2007. Robust statistics in data analysis - A review. Chemometrics and Intelligent Laboratory Systems 85, 203-219. https://doi.org/10.1016/j.chemolab.2006.06.016

Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563

Hubert, M., Rousseeuw, P., Verdonck, T., 2009. Robust PCA for skewed data and its outlier map. Computational Statistics & Data Analysis 53, 2264-2274. https://doi.org/10.1016/j.csda.2008.05.027

N. Locantore, J.S. Marron, D.G. Simpson, N. Tripoli, J.T. Zhang, K.L. Cohen, Robust principal component analysis for functional data, Test 8 (1999) 1â7

Maronna, R.A., Yohai, V.J., 1995. The Behavior of the Stahel-Donoho Robust Multivariate Estimator. Journal of the American Statistical Association 90, 330-341. https://doi.org/10.1080/01621459.1995.10476517

Maronna, R., 2005. Principal Components and Orthogonal Regression Based on Robust Scales, Technometrics, 47:3, 264-273, DOI: 10.1198/004017005000000166

Examples


data(datoctane)

## 6 samples (25, 26, and 36â39) are outliers
## (contain added alcohol)
## See Hubert et al. 2005

X <- datoctane$X
plotsp(X)

ncomp <- 2
fm <- pca_cr(X, ncomp = ncomp)
#fm <- pca_sph(X, ncomp = ncomp)
#fm <- pca_rob(X, ncomp = ncomp)
#fm <- pca_eigen(X, ncomp = ncomp)
z <- scordis(fm, robust = TRUE)
sd <- z$dr$d
cutsd <- z$cutoff
z <- odis(fm, X, robust = TRUE)
od <- z$dr$d
cutod <- z$cutoff
plot(sd, od, pch = 16,
  xlab = "SD", ylab = "OD")
abline(h = cutod, v = cutsd, lty = 2)
u <- c(25, 26, 36:39)
points(sd[u], od[u], col = "green", cex = 1.5)

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.

mlesnoff/rnirs index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

pca_rob: Robust PCA algorithms
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Robust PCA algorithms

Description

Usage

Arguments

Value

References

Examples

Related to pca_rob in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs Dimension reduction, Regression and Discrimination for Chemometrics

pca_rob: Robust PCA algorithms In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

Robust PCA algorithms

Description

Usage

Arguments

Value

References

Examples

Related to pca_rob in mlesnoff/rnirs...

R Package Documentation

Browse R Packages

We want your feedback!

mlesnoff/rnirs
Dimension reduction, Regression and Discrimination for Chemometrics

pca_rob: Robust PCA algorithms
In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics