selcoll | R Documentation |
The function helps selecting the dimension (i.e. nb. components) of a PCA or PLS by bootstrapping the observations and exploring the collinearity of the loading vectors of same rank or, for PLS (univariate), eventually the b-coefficient vectors.
The principle is detailed below for loading vectors (the same applies to b-coefficient vectors).
A non parametric bootstrap is implemented on the rows of matrix X
(and eventually Y
if PLS), and the loading matrices P(b) b = 1,...,B
are calculated for each bootstrap replication b
, all with a total number of A
columns. For a given model dimension a <= A
, the B
loading vectors corresponding to loadings "a
" (column a
in matrices P(b)
) are set in a matrix V(a)
(this last matrix has B
columns).
Then, two alternative measures of collinearity are proposed, depending on argument corr
:
1) Default method. Correlation coefficients are calculated between couples of columns V(a)
and set in a vector v
. The non-collinearity indicator r
is the quantile of the elements in v
(by default, prob = 1, correspondind to max(v))
.
2) A SVD decompostion of V(a)
is computed, and the collinearity measure r
between the B
vectors is given by proportion of variance accounted by the first SVD dimension (i.e. r = eig[1] / sum(eigs)
).
Low collinearity between the vectors of rank a
(columns of matrix matrix V(a)
) may indicate they may have built with large uncertainity (generating unstability in V(a)
). Jumps in the curve of r
, followed by regular patterns are also informative.
selcoll(
X, Y = NULL, ncomp = NULL, algo = NULL,
B = 50, seed = NULL,
type = c("P", "b"),
coll = c("corr", "eig"),
prob = 1,
plot = TRUE,
xlab = "Nb. components", ylab = NULL,
print = TRUE,
...
)
X |
A |
Y |
For PLS, a |
ncomp |
The maximal number of PCA or PLS scores (= components = latent variables) to be calculated. |
algo |
For |
B |
Number of bootstrap replications. |
seed |
An integer defining the seed for the random simulation, or |
type |
Type of output whose the stability is evaluated. Possible values are "P" (loadings; default) and or "b" (b-coefficients). |
coll |
Type of collinearity measure. Possible values are "corr" (quantile of correlation coefficeints; default) or "eig" (SVD decomposition). |
prob |
Probability level for quantile (default to 1; the maximal vaule is considered). |
plot |
Logical. If |
xlab |
Label for the x-axis of the plot. |
ylab |
Label for the y-axis of the plot. |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list with output r
, see examples.
data(datcass)
Xr <- datcass$Xr
yr <- datcass$yr
ncomp <- 30
selcoll(Xr, ncomp = ncomp, B = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.