lspls: Weighted LS-PLS gaussian regression
In lsplsGlm: Classification using LS-PLS for Logistic Regression

Description Usage Arguments Details Value Author(s) References Examples

Performs a weighted Least Square-Partial Least Square gaussian regression for both clinical and genetic data.

1	lspls(Y, D, X, W=diag(rep(1,nrow(D))), ncomp)

`Y`	a vector of length `n` giving the classes of the `n` observations. `Y` contains continuous values.
`X`	a data matrix (`nxp`) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.
`D`	a data matrix (`nxq`) of clinical data. NAs and Inf are not allowed. Each row corresponds to an observation and each colone to a clinical variable.
`W`	weight matrix, if `W` is the identity matrix then the function will perform a standard LS-PLS regression.
`ncomp`	a positive integer. `ncomp` is the number of selected components.

This function is a combination of Least Squares (LS) and Partial Least Square (PLS)[1]. This is an iterative procedure: the first step is to use OLS on D to predict Y. New estimates for the residuals of Y on D are calculated from this regression and the algorithm is repeated until convergence. Here we use the orthogonalised variant. To do that we create a new matrix which is the projection of the matrix X into a space orthogonal to the space spanned by the design variables of D. The standard PLS regression is then used on this new matrix instead of X [2].

`predictors`	matrix which combines `D` and scores from PLS regression
`projection`	the projection matrix used to convert `X` to scores.
`orthCoef`	the coefficients matrix of size `pxq` to be used to compute new predictors.
`coefficients`	an array of PLS regression coefficients (`(p+1)xncomp`)
`intercept`	the constant of the model.

Caroline Bazzoli, Thomas Bouleau, Sophie Lambert-Lacroix

[1] Jørgensen, K., Segtnan, V., Thyholt, K., and Næs, T. (2004). A comparison of methods for analysing regression models with both spectral and designed variables. Journal of Chemometrics, 18(10), 451-464.

[2] Caroline Bazzoli, Sophie Lambert-Lacroix. Classification using LS-PLS with logistic regression based on both clinical and gene expression variables. 2017. <hal-01405101>

#X simulation
meanX<-sample(1:300,50)
sdeX<-sample(50:150,50)
X<-matrix(nrow=60,ncol=50)
for (i in 1:50){
  X[,i]<-rnorm(60,meanX[i],sdeX[i])
}

#D simulation
meanD<-sample(1:30,5)
sdeD<-sample(1:15,5)
D<-matrix(nrow=60,ncol=5)
for (i in 1:5){
  D[,i]<-rnorm(60,meanD[i],sdeD[i])
}

#Y simulation
Y<-rnorm(60,30,10)

# Learning sample
index<-sample(1:length(Y),round(2*length(Y)/3))
XL<-X[index,]
DL<-D[index,]
YL<-Y[index]

#fit the model
fit<-lspls(YL,X=XL,D=DL,ncomp=3,W=diag(rep(1,length(YL))))

#Testing sample
newX=X[-index,]
newD<-D[-index,]

#predictions with the constant of the model
a.coefficients<-c(fit$intercept,fit$coefficients)

#predictions
newZ=(newX-cbind(rep(1,dim(newD)[1]),newD)%*%fit$orthCoef)%*%fit$projection
newY=cbind(rep(1,dim(newD)[1]),newD,newZ)%*%a.coefficients