selwik | R Documentation |
The function helps selecting the dimension (i.e. nb. components) of PLSR models.
The method was proposed by Wiklund et al. 2007 and Faber et al. 2007. For a given PLS score t
, the principle is to compare the observed covariance Cov(Y, t)
(where Y
is the response) to the distribution H0
of simulated Cov(Y, t)
computed on randomly permuted data (in which the relation between Y
and X
is assumed being removed). A significant observed covariance compared to distribution H0
is expected indicating a meaningful dimension.
The method can be time-consuming, especially for large datasets, since permutations are conditional to each component taken successively (successive one-dimension PLSR). A one-dimension PLSR is firstly implemented, data Y
are randomly permuted (referred to as "Y
-scambling"), and distribution H0
is computed. Then, information contained in the first dimension is removed from the data by deflation, and a the next dimension is studied by a new one-dimension PLSR, and so on.
Wiklund et al. 2007 and Faber et al. 2007 presented the method for PLSR1 models only (univariate Y
). The function extends the method to PLSR2 (multivariate Y
).
The function returns the p-value of the on-side test, i.e. the proportion of distribution H0
higher than the observed covariance.
selwik(
X, Y, ncomp,
algo = NULL, weights = NULL,
nperm = 50, seed = NULL,
print = TRUE,
...
)
X |
A |
Y |
A |
ncomp |
The maximal number of scores (i.e. components = latent variables) to be calculated. |
algo |
A function implementing a PLS. Default to |
weights |
A vector of length |
nperm |
Number of random permutations. |
seed |
An integer defining the seed for the random simulation, or |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list with outputs, see the examples.
Faber, N.M., Rajko, R., 2007. How to avoid over-fitting in multivariate calibrationâThe conventional validation approach and an alternative. Analytica Chimica Acta, Papers presented at the 10th International Conference on Chemometrics in Analytical Chemistry 595, 98-106. https://doi.org/10.1016/j.aca.2007.05.030
Wiklund, S., Nilsson, D., Eriksson, L., Sjöström, M., Wold, S., Faber, K., 2007. A randomization test for PLS component selection. Journal of Chemometrics 21, 427â439. https://doi.org/10.1002/cem.1086
data(datcass)
Xr <- datcass$Xr
yr <- datcass$yr
z <- selwik(Xr, yr, ncomp = 20, nperm = 30)
names(z)
plot(z$ncomp, z$pval,
type = "b", pch = 16, col = "#045a8d",
xlab = "Nb components", ylab = "p-value",
main = "Wiklund et al. test")
alpha <- .10
abline(h = alpha, col = "grey")
u <- which(z$pval >= alpha)
opt <- min(u) - 1
opt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.