d.spls.cv | R Documentation |
The function d.spls.cv
uses the cross validation approach described in Boulesteix and Strimmer (2005) (see in references) in order to
choose the most adequat number of latent components for a Dual-SPLS regression.
d.spls.cv(X,Y,ncomp,dspls="lasso",ppnu,nu2,nrepcv=30,pctcv=70,indG,gamma)
X |
a numeric matrix of predictors values of dimension |
Y |
a numeric vector or a one column matrix of responses. It represents the response variable for each observation. |
ncomp |
a positive integer or a numeric vector of the number of Dual-SPLS components to choose from. |
dspls |
the norm type of the Dual-SPLS regression applied. Default value is |
ppnu |
a positive real value, in |
nu2 |
a positive real value. |
nrepcv |
a positive integer indicating the number of cross-validation iterations to be performed. Default value is 30. |
pctcv |
a positive real value in |
indG |
a numeric vector of group index for each observation. It is used in the cases of the group lasso norms. |
gamma |
a numeric vector of the norm |
The procedure is described in the Boulesteix and Strimmer. It is based on randomly selecting, pctcv%
of calibration observations at each
cross validation iteration and performing the Dual-SPLS regression. The rest of the observation are used as a validation and the
errors are computed accordingly for each components. nrepcv
iterations are done and the mean squared of each of the nrepcv
errors for each
component are computed. The latent component with the smallest mean value is selected as the best.
A integer
representing the best number of latent components to be used in the Dual-SPLS regression based on the cross validation procedure.
Louna Alsouki François Wahl
A. L. Boulesteix and K. Strimmer (2005). Predicting Transcription Factor Activities from Combined Analysis of Microarray and ChIP Data: A Partial Least Squares Approach.
H. Wold. Path Models with Latent Variables: The NIPALS Approach. In H.M. Blalock et al., editor, Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pages 307–357. Academic Press, 1975.
### load dual.spls library
library(dual.spls)
### constructing the simulated example
oldpar <- par(no.readonly = TRUE)
n <- 100
p <- 50
nondes <- 20
sigmaondes <- 0.5
data=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)
X <- data$X
y <- data$y
#fitting the PLS model
ncomp_PLS <- d.spls.cv(X=X,Y=y,ncomp=10,dspls="pls",nrepcv=20,pctcv=75)
mod.dspls.pls <- d.spls.pls(X=X,y=y,ncp=ncomp_PLS,verbose=TRUE)
str(mod.dspls.pls)
### plotting the observed values VS predicted values for ncomp components
plot(y,mod.dspls.pls$fitted.values[,ncomp_PLS], xlab="Observed values", ylab="Predicted values",
main=paste("Observed VS Predicted for ", ncomp_PLS," components"))
points(-1000:1000,-1000:1000,type='l')
### plotting the regression coefficients
par(mfrow=c(3,1))
i=ncomp_PLS
plot(1:dim(X)[2],mod.dspls.pls$Bhat[,i],type='l',
main=paste(" Dual-SPLS (PLS), ncp =", i,
ylab='',xlab='' ))
#fitting the Dual-SPLS lasso model
ncomplasso <- d.spls.cv(X=X,Y=y,ncomp=10,dspls="lasso",ppnu=0.9,nrepcv=20,pctcv=75)
mod.dspls.lasso <- d.spls.lasso(X=X,y=y,ncp=ncomplasso,ppnu=0.9,verbose=TRUE)
str(mod.dspls.lasso)
### plotting the observed values VS predicted values for ncomp components
plot(y,mod.dspls.lasso$fitted.values[,ncomplasso], xlab="Observed values", ylab="Predicted values",
main=paste("Observed VS Predicted for ", ncomplasso," components"))
points(-1000:1000,-1000:1000,type='l')
### plotting the regression coefficients
par(mfrow=c(3,1))
i=ncomplasso
nz=mod.dspls.lasso$zerovar[i]
plot(1:dim(X)[2],mod.dspls.lasso$Bhat[,i],type='l',
main=paste(" Dual-SPLS (lasso), ncp =", i, " #0coef =", nz, "/", dim(X)[2]),
ylab='',xlab='' )
inonz=which(mod.dspls.lasso$Bhat[,i]!=0)
points(inonz,mod.dspls.lasso$Bhat[inonz,i],col='red',pch=19,cex=0.5)
legend("topright", legend ="non null values", bty = "n", cex = 0.8, col = "red",pch=19)
par(oldpar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.