selangle | R Documentation |
The function helps selecting the dimension (i.e. nb. components) of a PCA or PLS by bootstrapping the observations and exploring the stability of the loading matrix P
. Stability is quantified by angles between the boostrapped matrices.
The general idea was proposed by Ye & Weiss 2003 for the sliced inverse regression, and applied to PCA by Luo & Li 2016. The loading matrix P
(with a total number of A
columns, i.e. loading vectors) is computed on the raw matrix X
. Then, a non parametric bootstrap is implemented on the rows of matrix X
, and the loading matrices P(b) b = 1,...,B
are calculated for each bootstrap replication b
, all with A
columns.
For a given model dimension a <= A
, an "angle" is then calculated between the raw matrix P
and each matrix P(b)
, all with considering only the first a
columns. The stability indicator for a matrix P
with a
vectors is the mean of the B
angles.
Higher is the mean angle (meaning that the compared matrices do not span the same space), lower is the stability of matrix P
whose some last columns were probably with large uncertainty.
Two measures of angle are proposed, depending on argument angle
1) Default: The "maxsub" angle (See Krzanowski, 1979, Hubert et al 2005, and Engelen et al. 2005).
2) The vector correlation coefficient "q" (Hotelling 1936) used by Ye & Weiss 2003 and Luo & Li 2016).
Print function rnirs::.corvec for the formulas.
Angles are first computed in radians (the right angle = pi / 2
), and then divided by pi / 2
to vary between 0 and 1 (1 = minimal stability).
Jumps in the curve of the mean angle, followed by regular patterns are also informative.
selangle(
X, Y = NULL, ncomp = NULL, algo = NULL,
B = 50, seed = NULL,
angle = c("maxsub", "hot"),
plot = TRUE,
xlab = "Nb. components", ylab = NULL,
print = TRUE,
...
)
X |
A |
Y |
For PLS, a |
ncomp |
The maximal number of PCA or PLS scores (= components = latent variables) to be calculated. |
algo |
For |
B |
Number of bootstrap replications. |
seed |
An integer defining the seed for the random simulation, or |
angle |
Type of angle. Possible values are "maxsub" (default) or "hot" (q of Hotelling). |
plot |
Logical. If |
xlab |
Label for the x-axis of the plot. |
ylab |
Label for the y-axis of the plot. |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list with output r
= vector of the standardized angle.
Engelen, S., Hubert, M., Branden, K.V., 2005. A Comparison of Three Procedures for Robust PCA in High Dimensions. Austrian Journal of Statistics 34, 117-126-117-126. https://doi.org/10.17713/ajs.v34i2.405
Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563
Krzanowski, W.J., 1979. Between-Groups Comparison of Principal Components. Journal of the American Statistical Association 74, 703-707. https://doi.org/10.1080/01621459.1979.10481674
Luo, W., Li, B., 2016. Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 103, 875-887. https://doi.org/10.1093/biomet/asw051
Ye, Z., Weiss, R.E., 2003. Using the Bootstrap to Select One of a New Class of Dimension Reduction Methods. Jasa 98, 968-979. https://doi.org/10.1198/016214503000000927
data(datcass)
Xr <- datcass$Xr
yr <- datcass$yr
ncomp <- 30
selangle(Xr, yr, ncomp = ncomp, B = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.