GSIM for binary data

Description

The function gsim performs prediction using Lambert-Lacroix and Peyre's GSIM algorithm.

Usage

1
gsim(Xtrain, Ytrain, Xtest=NULL, Lambda, hA, hB=NULL, NbIterMax=50)

Arguments

Xtrain

a (ntrain x p) data matrix of predictors. Xtrain must be a matrix. Each row corresponds to an observation and each column to a predictor variable.

Ytrain

a ntrain vector of responses. Ytrain must be a vector. Ytrain is a {1,2}-valued vector and contains the response variable for each observation.

Xtest

a (ntest x p) matrix containing the predictors for the test data set. Xtest may also be a vector of length p (corresponding to only one test observation). If Xtest is not equal to NULL, then the prediction step is made for these new predictor variables.

Lambda

a positive real value. Lambda is the ridge regularization parameter.

hA

a strictly positive real value. hA is the bandwidth for GSIM step A.

hB

a strictly positive real value. hB is the bandwidth for GSIM step B. if hB is equal to NULL, then hB value is chosen using a plug-in method.

NbIterMax

a positive integer. NbIterMax is the maximal number of iterations in the Newton-Rapson parts.

Details

The columns of the data matrices Xtrain and Xtest may not be standardized, since standardizing is performed by the function gsim as a preliminary step before the algorithm is run.

The procedure described in Lambert-Lacroix and Peyre (2005) is used to estimate the projection direction beta. When Xtest is not equal to NULL, the procedure predicts the labels for these new predictor variables.

Value

A list with the following components:

Ytest

the ntest vector containing the predicted labels for the observations from Xtest.

beta

the p vector giving the projection direction estimated.

hB

the value of hB used in step B of GSIM (value given by the user or estimated by plug-in if the argument value was equal to NULL)

DeletedCol

the vector containing the column number of Xtrain when the variance of the corresponding predictor variable is null. Otherwise DeletedCol=NULL

Cvg

the 0-1 value indicating convergence of the algorithm (1 for convergence, 0 otherwise).

Author(s)

Sophie Lambert-Lacroix (http://membres-timc.imag.fr/Sophie.Lambert/) and Julie Peyre (http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/).

References

S. Lambert-Lacroix, J. Peyre . (2006) Local likelyhood regression in generalized linear single-index models with applications to microarrays data. Computational Statistics and Data Analysis, vol 51, n 3, 2091-2113.

See Also

gsim.cv, mgsim, mgsim.cv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# load plsgenomics library
library(plsgenomics)

# load Colon data
data(Colon)
IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8))

Xtrain <- Colon$X[IndexLearn,]
Ytrain <- Colon$Y[IndexLearn]
Xtest <- Colon$X[-IndexLearn,]

# preprocess data
resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),
		log10.scale=TRUE,row.stand=TRUE)

# perform prediction by GSIM
res <- gsim(Xtrain=resP$pXtrain,Ytrain= Ytrain,Xtest=resP$pXtest,Lambda=10,hA=50,hB=NULL)
   
res$Cvg
sum(res$Ytest!=Colon$Y[-IndexLearn])

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.