# selangle: Heuristic selection of the dimension of a PCA or PLS model... In mlesnoff/rnirs: Dimension reduction, Regression and Discrimination for Chemometrics

 selangle R Documentation

## Heuristic selection of the dimension of a PCA or PLS model using angles between bootstrapped loading matrices

### Description

The function helps selecting the dimension (i.e. nb. components) of a PCA or PLS by bootstrapping the observations and exploring the stability of the loading matrix `P`. Stability is quantified by angles between the boostrapped matrices.

The general idea was proposed by Ye & Weiss 2003 for the sliced inverse regression, and applied to PCA by Luo & Li 2016. The loading matrix `P` (with a total number of `A` columns, i.e. loading vectors) is computed on the raw matrix `X`. Then, a non parametric bootstrap is implemented on the rows of matrix `X`, and the loading matrices `P(b) b = 1,...,B` are calculated for each bootstrap replication `b`, all with `A` columns.

For a given model dimension `a <= A`, an "angle" is then calculated between the raw matrix `P` and each matrix `P(b)`, all with considering only the first `a` columns. The stability indicator for a matrix `P` with `a` vectors is the mean of the `B` angles.

Higher is the mean angle (meaning that the compared matrices do not span the same space), lower is the stability of matrix `P` whose some last columns were probably with large uncertainty.

Two measures of angle are proposed, depending on argument `angle`

1) Default: The "maxsub" angle (See Krzanowski, 1979, Hubert et al 2005, and Engelen et al. 2005).

2) The vector correlation coefficient "q" (Hotelling 1936) used by Ye & Weiss 2003 and Luo & Li 2016).

Print function rnirs::.corvec for the formulas.

Angles are first computed in radians (the right angle = `pi / 2`), and then divided by `pi / 2` to vary between 0 and 1 (1 = minimal stability).

Jumps in the curve of the mean angle, followed by regular patterns are also informative.

### Usage

``````
selangle(
X, Y = NULL, ncomp = NULL, algo = NULL,
B = 50, seed = NULL,
angle = c("maxsub", "hot"),
plot = TRUE,
xlab = "Nb. components", ylab = NULL,
print = TRUE,
...
)

``````

### Arguments

 `X` A `n x p` matrix or data frame of variables. `Y` For PLS, a `n x q` matrix or data frame, or a vector of length `n`, of responses. If `NULL` (default) a PCA is implented. `ncomp` The maximal number of PCA or PLS scores (= components = latent variables) to be calculated. `algo` For `pca`, a function (algorithm) implementing a PCA. Default to `NULL`: if `n < p`, `pca_eigenk` is used; in the other case, `pca_eigen` is used. For `pls`, a function implementing a PLS. Default to `NULL` (`pls_kernel` is used). `B` Number of bootstrap replications. `seed` An integer defining the seed for the random simulation, or `NULL` (default). See `set.seed`. `angle` Type of angle. Possible values are "maxsub" (default) or "hot" (q of Hotelling). `plot` Logical. If `TRUE` (default), results are plotted. `xlab` Label for the x-axis of the plot. `ylab` Label for the y-axis of the plot. `print` Logical. If `TRUE`, fitting information are printed. `...` Optionnal arguments to pass in the function defined in `algo`.

### Value

A list with output `r` = vector of the standardized angle.

### References

Engelen, S., Hubert, M., Branden, K.V., 2005. A Comparison of Three Procedures for Robust PCA in High Dimensions. Austrian Journal of Statistics 34, 117-126-117-126. https://doi.org/10.17713/ajs.v34i2.405

Hubert, M., Rousseeuw, P.J., Vanden Branden, K., 2005. ROBPCA: A New Approach to Robust Principal Component Analysis. Technometrics 47, 64-79. https://doi.org/10.1198/004017004000000563

Krzanowski, W.J., 1979. Between-Groups Comparison of Principal Components. Journal of the American Statistical Association 74, 703-707. https://doi.org/10.1080/01621459.1979.10481674

Luo, W., Li, B., 2016. Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 103, 875-887. https://doi.org/10.1093/biomet/asw051

Ye, Z., Weiss, R.E., 2003. Using the Bootstrap to Select One of a New Class of Dimension Reduction Methods. Jasa 98, 968-979. https://doi.org/10.1198/016214503000000927

### Examples

``````
data(datcass)
Xr <- datcass\$Xr
yr <- datcass\$yr

ncomp <- 30
selangle(Xr, yr, ncomp = ncomp, B = 10)

``````

mlesnoff/rnirs documentation built on April 24, 2023, 4:17 a.m.