cv.lspls.glm: Cross-validation for LS-PLS model for logistic regression
In lsplsGlm: Classification using LS-PLS for Logistic Regression

Description Usage Arguments Details Value Author(s) See Also Examples

Finds the optimal number of component for one of the three extesions of LS-PLS. Moreover it finds the lambda optimal for the R-LS-PLS method.

1
2
3

cv.lspls.glm(Y, X, D, ncompmax, folds = 5, proportion = 0.9, method=c("LS-PLS-IRLS", 
"R-LS-PLS", "IR-LS-PLS"),lambda.grid=NULL,penalized=NULL, 
nbrIterMax=NULL,threshold=NULL)

`Y`	a vector of length `n` giving the classes of the `n` observations. The classes must be coded as 1 or 0.
`X`	a data matrix (`nxp`) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.
`D`	a data matrix (`nxq`) of clinical data. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a clinical variable.
`ncompmax`	a positive integer. `ncompmax` is the maximal number of selected components.
`folds`	a positive integer indicating the number of folds in K-folds cross-validation procedure.
`proportion`	proportion of the dataset in the learning sample. `proportion` has to be between 0 and 1.
`method`	one of the three extensions of LS-PLS for logistic regression models (LS-PLS-IRLS, R-LS-PLS, IR-LS-PLS).
`lambda.grid`	vector of positif real (grid for ridge parameter). To use only if `method` equals to "R-LS-PLS". By default `lambda.grid = exp(log(10^seq(-3,2,0.7)))`
`penalized`	if TRUE the parameter associated with D is ridge penalized. To use only if `method` equals to "R-LS-PLS".
`nbrIterMax`	maximal number of iterations. To use only if `method` equals to "R-LS-PLS" or "IR-LS-PLS".
`threshold`	used for the stopping rule. To use only if `method` equals to "R-LS-PLS" or "IR-LS-PLS".

This function finds the optimal number of component and the optimal lambda for a LS-PLS regression. At each cross validation run, X, D and Y are split into one training set and one test set (of proportion proportion and 1-proportion). Then for each component between 1 and ncompmax (and for each value of lambda.grid if method equals to R-LS-PLS) classification error rate is determined. At the end we choose the lambda and the ncomp for which the classification error rate is minimal. This function returns also p.cvg. It's a vector of size ncompmax which contains convergence proportion for each number of component between 1 and ncompmax. For the method R-LS-PLS, p.cvg is a matrix of size ncompmax x length(lambda.grid).

`ncompopt`	the optimal number of component.
`lambdaopt`	lambda optimal.
`p.cvg`	convergence proportion.

Caroline Bazzoli, Thomas Bouleau, Sophie Lambert-Lacroix

fit.lspls.glm.

#data
data(BreastCancer)
#vector of responses
Y<-BreastCancer$Y
#Genetic data
X<-BreastCancer$X
#Clinical data
D<-BreastCancer$D
#SIS selection
X<-scale(X)

X<-SIS.selection(X=X,Y=Y,pred=50)
#Cross validation, 90% of our datasets is used to compose learning samples

#method LS-PLS-IRLS
ncompopt.lsplsirls<-cv.lspls.glm(Y=Y,X=X,D=D,folds=5,ncompmax=5,proportion=0.9,
		method="LS-PLS-IRLS")$ncompopt
#method R-LS-PLS
cv<-cv.lspls.glm(Y=Y,X=X,D=D,ncompmax=5,proportion=0.9,method="R-LS-PLS",
	                          lambda.grid=exp(log(10^seq(-3,2,0.7))), 
                                  penalized=TRUE,nbrIterMax=15,
	                          threshold=10^(-12))
ncompopt.rlspls<-cv$ncompopt
lambdaopt.rlspls<-cv$lambdaopt
#method IR-LS-PLS
ncompopt.irlspls<-cv.lspls.glm(Y=Y,X=X,D=D,ncompmax=5,proportion=0.9,method="IR-LS-PLS",
                               nbrIterMax=15,threshold=10^(-12))$ncompopt