Tuning parameters (ncomp, lambda.l1, lambda.ridge) for Ridge Iteratively Reweighted Least Squares followed by Adaptive Sparse PLS regression for binary response, by Kfold crossvalidation
Description
The function rirls.spls.tune
tuns the hyperparameter values used in the
rirls.spls
procedure, by minimizing the prediction error rate over the hyperparameter
grid, using Durif et al. (2015) RIRLSSPLS algorithm.
Usage
1 2 3  rirls.spls.tune(X, Y, lambda.ridge.range, lambda.l1.range, ncomp.range,
adapt=TRUE, maxIter=100, svd.decompose=TRUE, return.grid=FALSE,
ncores=1, nfolds=10)

Arguments
X 
a (n x p) data matrix of predictors. 
Y 
a ntrain vector of responses. 
lambda.ridge.range 
a vector of positive real values. 
lambda.l1.range 
a vecor of positive real values, in [0,1]. 
ncomp.range 
a vector of positive integers. 
adapt 
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or nor. 
maxIter 
a positive integer. 
svd.decompose 
a boolean parameter. 
return.grid 
a boolean values indicating whether the grid of hyperparameters values with corresponding mean prediction error rate over the folds should be returned or not. 
ncores 
a positve integer, indicating if the crossvalidation procedure should be parallelized over the folds (ncores > nfolds would lead to the generation of unused child process). If ncores>1, the procedure generates ncores child process over the cpu corresponding number of cpu cores (see details). 
nfolds 
a positive integer indicating the number of folds in Kfolds crossvalidation procedure, nfolds=n corresponds to leaveoneout crossvalidation. 
Details
The columns of the data matrices X
may not be standardized,
since standardizing is performed by the function rirls.spls
as a preliminary step
before the algorithm is run.
The procedure is described in Durif et al. (2015). The Kfold crossvalidation can be summarize as follow: the train set is partitioned into K folds, for each value of hyper parameters the model is fit K times, using each fold to compute the prediction error rate, and fitting the model on the remaining observations. The crossvalidation procedure returns the optimal hyperparameters values, meaning the one that minimize the prediction error rate averaged over all the folds.
This procedures uses the mclapply
from the parallel
package, available on
GNU/Linux and MacOS. Users of Microsoft Windows can refer to the README file in the source to
be able to use a mclapply type function.
Value
A list with the following components:
lambda.ridge.opt 
the optimal value in 
lambda.l1.opt 
the optimal value in 
ncomp.opt 
the optimal value in 
conv.per 
the overall percentage of models that converge during the crossvalidation procedure. 
cv.grid 
the grid of hyperparameters and corresponding prediction error rate over the
nfolds.

Author(s)
Ghislain Durif (http://lbbe.univlyon1.fr/DurifGhislain.html).
References
G. Durif, F. Picard, S. LambertLacroix (2015). Adaptive sparse PLS for logistic regression, (in prep), available on (http://arxiv.org/abs/1502.05933).
See Also
rirls.spls
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  ### load plsgenomics library
library(plsgenomics)
### generating data
n < 50
p < 100
sample1 < sample.bin(n=n, p=p, kstar=20, lstar=2, beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5)
X < sample1$X
Y < sample1$Y
### hyperparameters values to test
lambda.l1.range < seq(0.05,0.95,by=0.3) # between 0 and 1
ncomp.range < 1:2
# loglinear range between 0.01 a,d 1000 for lambda.ridge.range
logspace < function( d1, d2, n) exp(log(10)*seq(d1, d2, length.out=n))
lambda.ridge.range < signif(logspace(d1 < 2, d2 < 3, n=6), digits=3)
### tuning the hyperparameters
cv1 < rirls.spls.tune(X=X, Y=Y, lambda.ridge.range=lambda.ridge.range,
lambda.l1.range=lambda.l1.range, ncomp.range=ncomp.range,
adapt=TRUE, maxIter=100, svd.decompose=TRUE,
return.grid=TRUE, ncores=1, nfolds=10)
str(cv1)
