cv.lspcr.glm: Cross-validation for LS-PCR model for logistic regression

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Finds the optimal number of component for LS-PCR model for logistic regression.

Usage

1
cv.lspcr.glm(Y, X, D, ncompmax, folds = 5, proportion = 0.9)

Arguments

Y

a vector of length n giving the classes of the n observations. The classes must be coded as 1 or 0.

X

a data matrix (nxp) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.

D

a data matrix (nxq) of clinical data. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a clinical variable.

ncompmax

a positive integer. ncompmax is the maximal number of selected components.

folds

a positive integer indicating the number of folds in K-folds cross-validation procedure.

proportion

proportion of the dataset in the learning sample. proportion has to be between 0 and 1.

Details

This function finds the optimal number of component for a LS-PCR model. At each cross validation run, X, D and Y are split into one training set and one test set (of proportion proportion and 1-proportion). Then the classification error rate is computed for each value of ncomp between 1 and ncompmax. At the end we choose the number of component for which the classification error rate is minimal. This function returns also p.cvg. It's a vector of size ncompmax which contains convergence proportion of the logistic regression for each number of component between 1 and ncompmax.

Value

ncompopt

the optimal number of component.

p.cvg

convergence proportion of the logistic regression.

Author(s)

Caroline Bazzoli, Thomas Bouleau, Sophie Lambert-Lacroix

See Also

fit.lspcr.glm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#data
data(BreastCancer)
#vector of responses
Y<-BreastCancer$Y
#Genetic data
X<-BreastCancer$X
#Clinical data
D<-BreastCancer$D
#SIS selection
X<-scale(X)

X<-SIS.selection(X=X,Y=Y,pred=50)
#cross validation to find the optimal number of component
cv<-cv.lspcr.glm(Y=Y,X=X,D=D,folds=5,ncompmax=5,proportion=0.9)
ncompopt<-cv$ncompopt

lsplsGlm documentation built on May 2, 2019, 12:36 p.m.