cv.lspls.glm: Cross-validation for LS-PLS model for logistic regression

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Finds the optimal number of component for one of the three extesions of LS-PLS. Moreover it finds the lambda optimal for the R-LS-PLS method.

Usage

1
2
3
cv.lspls.glm(Y, X, D, ncompmax, folds = 5, proportion = 0.9, method=c("LS-PLS-IRLS", 
"R-LS-PLS", "IR-LS-PLS"),lambda.grid=NULL,penalized=NULL, 
nbrIterMax=NULL,threshold=NULL)

Arguments

Y

a vector of length n giving the classes of the n observations. The classes must be coded as 1 or 0.

X

a data matrix (nxp) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.

D

a data matrix (nxq) of clinical data. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a clinical variable.

ncompmax

a positive integer. ncompmax is the maximal number of selected components.

folds

a positive integer indicating the number of folds in K-folds cross-validation procedure.

proportion

proportion of the dataset in the learning sample. proportion has to be between 0 and 1.

method

one of the three extensions of LS-PLS for logistic regression models (LS-PLS-IRLS, R-LS-PLS, IR-LS-PLS).

lambda.grid

vector of positif real (grid for ridge parameter). To use only if method equals to "R-LS-PLS". By default lambda.grid = exp(log(10^seq(-3,2,0.7)))

penalized

if TRUE the parameter associated with D is ridge penalized. To use only if method equals to "R-LS-PLS".

nbrIterMax

maximal number of iterations. To use only if method equals to "R-LS-PLS" or "IR-LS-PLS".

threshold

used for the stopping rule. To use only if method equals to "R-LS-PLS" or "IR-LS-PLS".

Details

This function finds the optimal number of component and the optimal lambda for a LS-PLS regression. At each cross validation run, X, D and Y are split into one training set and one test set (of proportion proportion and 1-proportion). Then for each component between 1 and ncompmax (and for each value of lambda.grid if method equals to R-LS-PLS) classification error rate is determined. At the end we choose the lambda and the ncomp for which the classification error rate is minimal. This function returns also p.cvg. It's a vector of size ncompmax which contains convergence proportion for each number of component between 1 and ncompmax. For the method R-LS-PLS, p.cvg is a matrix of size ncompmax x length(lambda.grid).

Value

ncompopt

the optimal number of component.

lambdaopt

lambda optimal.

p.cvg

convergence proportion.

Author(s)

Caroline Bazzoli, Thomas Bouleau, Sophie Lambert-Lacroix

See Also

fit.lspls.glm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#data
data(BreastCancer)
#vector of responses
Y<-BreastCancer$Y
#Genetic data
X<-BreastCancer$X
#Clinical data
D<-BreastCancer$D
#SIS selection
X<-scale(X)

X<-SIS.selection(X=X,Y=Y,pred=50)
#Cross validation, 90% of our datasets is used to compose learning samples

#method LS-PLS-IRLS
ncompopt.lsplsirls<-cv.lspls.glm(Y=Y,X=X,D=D,folds=5,ncompmax=5,proportion=0.9,
		method="LS-PLS-IRLS")$ncompopt
#method R-LS-PLS
cv<-cv.lspls.glm(Y=Y,X=X,D=D,ncompmax=5,proportion=0.9,method="R-LS-PLS",
	                          lambda.grid=exp(log(10^seq(-3,2,0.7))), 
                                  penalized=TRUE,nbrIterMax=15,
	                          threshold=10^(-12))
ncompopt.rlspls<-cv$ncompopt
lambdaopt.rlspls<-cv$lambdaopt
#method IR-LS-PLS
ncompopt.irlspls<-cv.lspls.glm(Y=Y,X=X,D=D,ncompmax=5,proportion=0.9,method="IR-LS-PLS",
                               nbrIterMax=15,threshold=10^(-12))$ncompopt    

lsplsGlm documentation built on May 2, 2019, 12:36 p.m.