spls.cv | R Documentation |
The function spls.cv
chooses the optimal values for the
hyper-parameter of the spls
procedure, by minimizing the mean
squared error of prediction over the hyper-parameter grid,
using Durif et al. (2017) adaptive SPLS algorithm.
spls.cv(X, Y, lambda.l1.range, ncomp.range, weight.mat = NULL, adapt = TRUE,
center.X = TRUE, center.Y = TRUE, scale.X = TRUE, scale.Y = TRUE,
weighted.center = FALSE, return.grid = FALSE, ncores = 1, nfolds = 10,
nrun = 1, verbose = FALSE)
X |
a (n x p) data matrix of predictors. |
Y |
a (n) vector of (continuous) responses. |
lambda.l1.range |
a vecor of positive real values, in [0,1].
|
ncomp.range |
a vector of positive integers. |
weight.mat |
a (ntrain x ntrain) matrix used to weight the l2 metric
in the observation space, it can be the covariance inverse of the Ytrain
observations in a heteroskedastic context. If NULL, the l2 metric is the
standard one, corresponding to homoskedastic model ( |
adapt |
a boolean value, indicating whether the sparse PLS selection step sould be adaptive or not (see details). |
center.X |
a boolean value indicating whether the data matrices
|
center.Y |
a boolean value indicating whether the response values
|
scale.Y |
a boolean value indicating whether the response values
|
weighted.center |
a boolean value indicating whether the centering should take into account the weighted l2 metric or not (if TRUE, it requires that weighted.mat is non NULL). |
return.grid |
a boolean values indicating whether the grid of hyper-parameters values with corresponding mean prediction error rate over the folds should be returned or not. |
ncores |
a positve integer, indicating the number of cores that the cross-validation is allowed to use for parallel computation (see details). |
nfolds |
a positive integer indicating the number of folds in the
K-folds cross-validation procedure, |
nrun |
a positive integer indicating how many times the K-folds cross- validation procedure should be repeated, default is 1. |
verbose |
a boolean value indicating verbosity. |
scale.X |
aa |
boolean value indicating whether the data matrices
Xtrain
and Xtest
(if provided) should be scaled or not
(scale.X=TRUE
implies center.X=TRUE
).
The columns of the data matrices Xtrain
and Xtest
may not
be standardized, since standardizing can be performed by the function
spls.cv
as a preliminary step.
The procedure is described in Durif et al. (2017). The K-fold cross-validation can be summarize as follow: the train set is partitioned into K folds, for each value of hyper-parameters the model is fit K times, using each fold to compute the prediction error rate, and fitting the model on the remaining observations. The cross-validation procedure returns the optimal hyper-parameters values, meaning the one that minimize the mean squared error of prediction averaged over all the folds.
This procedures uses the mclapply
from the parallel
package,
available on GNU/Linux and MacOS. Users of Microsoft Windows can refer to
the README file in the source to be able to use a mclapply type function.
An object with the following attributes
lambda.l1.opt |
the optimal value in |
ncomp.opt |
the optimal value in |
cv.grid |
the grid of hyper-parameters and corresponding prediction
error rate over the folds.
|
Ghislain Durif (http://thoth.inrialpes.fr/people/gdurif/).
Durif G., Modolo L., Michaelsson J., Mold J. E., Lambert-Lacroix S., Picard F. (2017). High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression, (in prep), available on (http://arxiv.org/abs/1502.05933).
spls
## Not run:
### load plsgenomics library
library(plsgenomics)
### generating data
n <- 100
p <- 100
sample1 <- sample.cont(n=n, p=p, kstar=10, lstar=2,
beta.min=0.25, beta.max=0.75, mean.H=0.2,
sigma.H=10, sigma.F=5, sigma.E=5)
X <- sample1$X
Y <- sample1$Y
### hyper-parameters values to test
lambda.l1.range <- seq(0.05,0.95,by=0.1) # between 0 and 1
ncomp.range <- 1:10
### tuning the hyper-parameters
cv1 <- spls.cv(X=X, Y=Y, lambda.l1.range=lambda.l1.range,
ncomp.range=ncomp.range, weight.mat=NULL, adapt=TRUE,
center.X=TRUE, center.Y=TRUE,
scale.X=TRUE, scale.Y=TRUE, weighted.center=FALSE,
return.grid=TRUE, ncores=1, nfolds=10, nrun=1)
str(cv1)
### otpimal values
cv1$lambda.l1.opt
cv1$ncomp.opt
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.