Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/rirls.spls.tune.R
The function rirls.spls.tune
tuns the hyper-parameter values used in the
rirls.spls
procedure, by minimizing the prediction error rate over the hyper-parameter
grid, using Durif et al. (2015) RIRLS-SPLS algorithm.
1 2 3 | rirls.spls.tune(X, Y, lambda.ridge.range, lambda.l1.range, ncomp.range,
adapt=TRUE, maxIter=100, svd.decompose=TRUE, return.grid=FALSE,
ncores=1, nfolds=10)
|
X |
a (n x p) data matrix of predictors. |
Y |
a ntrain vector of responses. |
lambda.ridge.range |
a vector of positive real values. |
lambda.l1.range |
a vecor of positive real values, in [0,1]. |
ncomp.range |
a vector of positive integers. |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or nor. |
maxIter |
a positive integer. |
svd.decompose |
a boolean parameter. |
return.grid |
a boolean values indicating whether the grid of hyper-parameters values with corresponding mean prediction error rate over the folds should be returned or not. |
ncores |
a positve integer, indicating if the cross-validation procedure should be parallelized over the folds (ncores > nfolds would lead to the generation of unused child process). If ncores>1, the procedure generates ncores child process over the cpu corresponding number of cpu cores (see details). |
nfolds |
a positive integer indicating the number of folds in K-folds cross-validation procedure, nfolds=n corresponds to leave-one-out cross-validation. |
The columns of the data matrices X
may not be standardized,
since standardizing is performed by the function rirls.spls
as a preliminary step
before the algorithm is run.
The procedure is described in Durif et al. (2015). The K-fold cross-validation can be summarize as follow: the train set is partitioned into K folds, for each value of hyper- parameters the model is fit K times, using each fold to compute the prediction error rate, and fitting the model on the remaining observations. The cross-validation procedure returns the optimal hyper-parameters values, meaning the one that minimize the prediction error rate averaged over all the folds.
This procedures uses the mclapply
from the parallel
package, available on
GNU/Linux and MacOS. Users of Microsoft Windows can refer to the README file in the source to
be able to use a mclapply type function.
A list with the following components:
lambda.ridge.opt |
the optimal value in |
lambda.l1.opt |
the optimal value in |
ncomp.opt |
the optimal value in |
conv.per |
the overall percentage of models that converge during the cross-validation procedure. |
cv.grid |
the grid of hyper-parameters and corresponding prediction error rate over the
nfolds.
|
Ghislain Durif (http://lbbe.univ-lyon1.fr/-Durif-Ghislain-.html).
G. Durif, F. Picard, S. Lambert-Lacroix (2015). Adaptive sparse PLS for logistic regression, (in prep), available on (http://arxiv.org/abs/1502.05933).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ### load plsgenomics library
library(plsgenomics)
### generating data
n <- 50
p <- 100
sample1 <- sample.bin(n=n, p=p, kstar=20, lstar=2, beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5)
X <- sample1$X
Y <- sample1$Y
### hyper-parameters values to test
lambda.l1.range <- seq(0.05,0.95,by=0.3) # between 0 and 1
ncomp.range <- 1:2
# log-linear range between 0.01 a,d 1000 for lambda.ridge.range
logspace <- function( d1, d2, n) exp(log(10)*seq(d1, d2, length.out=n))
lambda.ridge.range <- signif(logspace(d1 <- -2, d2 <- 3, n=6), digits=3)
### tuning the hyper-parameters
cv1 <- rirls.spls.tune(X=X, Y=Y, lambda.ridge.range=lambda.ridge.range,
lambda.l1.range=lambda.l1.range, ncomp.range=ncomp.range,
adapt=TRUE, maxIter=100, svd.decompose=TRUE,
return.grid=TRUE, ncores=1, nfolds=10)
str(cv1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.