| selcoll | R Documentation |
The function helps selecting the dimension (i.e. nb. components) of a PCA or PLS by bootstrapping the observations and exploring the collinearity of the loading vectors of same rank or, for PLS (univariate), eventually the b-coefficient vectors.
The principle is detailed below for loading vectors (the same applies to b-coefficient vectors).
A non parametric bootstrap is implemented on the rows of matrix X (and eventually Y if PLS), and the loading matrices P(b) b = 1,...,B are calculated for each bootstrap replication b, all with a total number of A columns. For a given model dimension a <= A, the B loading vectors corresponding to loadings "a" (column a in matrices P(b)) are set in a matrix V(a) (this last matrix has B columns).
Then, two alternative measures of collinearity are proposed, depending on argument corr:
1) Default method. Correlation coefficients are calculated between couples of columns V(a) and set in a vector v. The non-collinearity indicator r is the quantile of the elements in v (by default, prob = 1, correspondind to max(v)).
2) A SVD decompostion of V(a) is computed, and the collinearity measure r between the B vectors is given by proportion of variance accounted by the first SVD dimension (i.e. r = eig[1] / sum(eigs)).
Low collinearity between the vectors of rank a (columns of matrix matrix V(a)) may indicate they may have built with large uncertainity (generating unstability in V(a)). Jumps in the curve of r, followed by regular patterns are also informative.
selcoll(
X, Y = NULL, ncomp = NULL, algo = NULL,
B = 50, seed = NULL,
type = c("P", "b"),
coll = c("corr", "eig"),
prob = 1,
plot = TRUE,
xlab = "Nb. components", ylab = NULL,
print = TRUE,
...
)
X |
A |
Y |
For PLS, a |
ncomp |
The maximal number of PCA or PLS scores (= components = latent variables) to be calculated. |
algo |
For |
B |
Number of bootstrap replications. |
seed |
An integer defining the seed for the random simulation, or |
type |
Type of output whose the stability is evaluated. Possible values are "P" (loadings; default) and or "b" (b-coefficients). |
coll |
Type of collinearity measure. Possible values are "corr" (quantile of correlation coefficeints; default) or "eig" (SVD decomposition). |
prob |
Probability level for quantile (default to 1; the maximal vaule is considered). |
plot |
Logical. If |
xlab |
Label for the x-axis of the plot. |
ylab |
Label for the y-axis of the plot. |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list with output r, see examples.
data(datcass)
Xr <- datcass$Xr
yr <- datcass$yr
ncomp <- 30
selcoll(Xr, ncomp = ncomp, B = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.