| selwik | R Documentation |
The function helps selecting the dimension (i.e. nb. components) of PLSR models.
The method was proposed by Wiklund et al. 2007 and Faber et al. 2007. For a given PLS score t, the principle is to compare the observed covariance Cov(Y, t) (where Y is the response) to the distribution H0 of simulated Cov(Y, t) computed on randomly permuted data (in which the relation between Y and X is assumed being removed). A significant observed covariance compared to distribution H0 is expected indicating a meaningful dimension.
The method can be time-consuming, especially for large datasets, since permutations are conditional to each component taken successively (successive one-dimension PLSR). A one-dimension PLSR is firstly implemented, data Y are randomly permuted (referred to as "Y-scambling"), and distribution H0 is computed. Then, information contained in the first dimension is removed from the data by deflation, and a the next dimension is studied by a new one-dimension PLSR, and so on.
Wiklund et al. 2007 and Faber et al. 2007 presented the method for PLSR1 models only (univariate Y). The function extends the method to PLSR2 (multivariate Y).
The function returns the p-value of the on-side test, i.e. the proportion of distribution H0 higher than the observed covariance.
selwik(
X, Y, ncomp,
algo = NULL, weights = NULL,
nperm = 50, seed = NULL,
print = TRUE,
...
)
X |
A |
Y |
A |
ncomp |
The maximal number of scores (i.e. components = latent variables) to be calculated. |
algo |
A function implementing a PLS. Default to |
weights |
A vector of length |
nperm |
Number of random permutations. |
seed |
An integer defining the seed for the random simulation, or |
print |
Logical. If |
... |
Optionnal arguments to pass in the function defined in |
A list with outputs, see the examples.
Faber, N.M., Rajko, R., 2007. How to avoid over-fitting in multivariate calibrationâThe conventional validation approach and an alternative. Analytica Chimica Acta, Papers presented at the 10th International Conference on Chemometrics in Analytical Chemistry 595, 98-106. https://doi.org/10.1016/j.aca.2007.05.030
Wiklund, S., Nilsson, D., Eriksson, L., Sjöström, M., Wold, S., Faber, K., 2007. A randomization test for PLS component selection. Journal of Chemometrics 21, 427â439. https://doi.org/10.1002/cem.1086
data(datcass)
Xr <- datcass$Xr
yr <- datcass$yr
z <- selwik(Xr, yr, ncomp = 20, nperm = 30)
names(z)
plot(z$ncomp, z$pval,
type = "b", pch = 16, col = "#045a8d",
xlab = "Nb components", ylab = "p-value",
main = "Wiklund et al. test")
alpha <- .10
abline(h = alpha, col = "grey")
u <- which(z$pval >= alpha)
opt <- min(u) - 1
opt
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.