The function `gsim.cv`

determines the best ridge regularization parameter and bandwidth to be
used for classification with GSIM as described in Lambert-Lacroix and Peyre (2005).

1 |

`Xtrain` |
a (ntrain x p) data matrix of predictors. |

`Ytrain` |
a ntrain vector of responses. |

`LambdaRange` |
the vector of positive real value from which the best ridge regularization parameter has to be chosen by cross-validation. |

`hARange` |
the vector of strictly positive real value from which the best bandwidth has to be chosen by cross-validation for GSIM step A. |

`hB` |
a strictly positive real value. |

`NbIterMax` |
a positive integer. |

The cross-validation procedure described in Lambert-Lacroix and Peyre (2005)
is used to determine the best ridge regularization parameter and bandwidth to be
used for classification with GSIM for binary data (for categorical data see
`mgsim`

and `mgsim.cv`

).
At each cross-validation run, `Xtrain`

is split into a pseudo training
set (ntrain - 1 samples) and a pseudo test set (1 sample) and the classification error rate is
determined for each value of ridge regularization parameter and bandwidth. Finally, the function
`gsim.cv`

returns the values of the ridge regularization parameter and
bandwidth for which the mean classification error rate is minimal.

A list with the following components:

`Lambda` |
the optimal regularization parameter. |

`hA` |
the optimal bandwidth parameter. |

Sophie Lambert-Lacroix (http://membres-timc.imag.fr/Sophie.Lambert/) and Julie Peyre (http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/).

S. Lambert-Lacroix, J. Peyre . (2006) Local likelyhood regression in generalized linear single-index models with applications to microarrays data. Computational Statistics and Data Analysis, vol 51, n 3, 2091-2113.

`mgsim`

, `gsim`

, `gsim.cv`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
# load plsgenomics library
library(plsgenomics)
# load Colon data
data(Colon)
IndexLearn <- c(sample(which(Colon$Y==2),12),sample(which(Colon$Y==1),8))
Xtrain <- Colon$X[IndexLearn,]
Ytrain <- Colon$Y[IndexLearn]
Xtest <- Colon$X[-IndexLearn,]
# preprocess data
resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),
log10.scale=TRUE,row.stand=TRUE)
# Determine optimum h and lambda
hl <- gsim.cv(Xtrain=resP$pXtrain,Ytrain=Ytrain,hARange=c(7,20),LambdaRange=c(0.1,1),hB=NULL)
# perform prediction by GSIM
res <- gsim(Xtrain=resP$pXtrain,Ytrain=Ytrain,Xtest=resP$pXtest,Lambda=hl$Lambda,hA=hl$hA,hB=NULL)
res$Cvg
sum(res$Ytest!=Colon$Y[-IndexLearn])
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.