lspls: Weighted LS-PLS gaussian regression

Description Usage Arguments Details Value Author(s) References Examples

View source: R/lspls.R

Description

Performs a weighted Least Square-Partial Least Square gaussian regression for both clinical and genetic data.

Usage

1
  lspls(Y, D, X, W=diag(rep(1,nrow(D))), ncomp)

Arguments

Y

a vector of length n giving the classes of the n observations. Y contains continuous values.

X

a data matrix (nxp) of genes. NAs and Inf are not allowed. Each row corresponds to an observation and each column to a gene.

D

a data matrix (nxq) of clinical data. NAs and Inf are not allowed. Each row corresponds to an observation and each colone to a clinical variable.

W

weight matrix, if W is the identity matrix then the function will perform a standard LS-PLS regression.

ncomp

a positive integer. ncomp is the number of selected components.

Details

This function is a combination of Least Squares (LS) and Partial Least Square (PLS)[1]. This is an iterative procedure: the first step is to use OLS on D to predict Y. New estimates for the residuals of Y on D are calculated from this regression and the algorithm is repeated until convergence. Here we use the orthogonalised variant. To do that we create a new matrix which is the projection of the matrix X into a space orthogonal to the space spanned by the design variables of D. The standard PLS regression is then used on this new matrix instead of X [2].

Value

predictors

matrix which combines D and scores from PLS regression

projection

the projection matrix used to convert X to scores.

orthCoef

the coefficients matrix of size pxq to be used to compute new predictors.

coefficients

an array of PLS regression coefficients ((p+1)xncomp)

intercept

the constant of the model.

Author(s)

Caroline Bazzoli, Thomas Bouleau, Sophie Lambert-Lacroix

References

[1] J<c3><b8>rgensen, K., Segtnan, V., Thyholt, K., and N<c3><a6>s, T. (2004). A comparison of methods for analysing regression models with both spectral and designed variables. Journal of Chemometrics, 18(10), 451-464.

[2] Caroline Bazzoli, Sophie Lambert-Lacroix. Classification using LS-PLS with logistic regression based on both clinical and gene expression variables. 2017. <hal-01405101>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#X simulation
meanX<-sample(1:300,50)
sdeX<-sample(50:150,50)
X<-matrix(nrow=60,ncol=50)
for (i in 1:50){
  X[,i]<-rnorm(60,meanX[i],sdeX[i])
}

#D simulation
meanD<-sample(1:30,5)
sdeD<-sample(1:15,5)
D<-matrix(nrow=60,ncol=5)
for (i in 1:5){
  D[,i]<-rnorm(60,meanD[i],sdeD[i])
}

#Y simulation
Y<-rnorm(60,30,10)

# Learning sample
index<-sample(1:length(Y),round(2*length(Y)/3))
XL<-X[index,]
DL<-D[index,]
YL<-Y[index]

#fit the model
fit<-lspls(YL,X=XL,D=DL,ncomp=3,W=diag(rep(1,length(YL))))

#Testing sample
newX=X[-index,]
newD<-D[-index,]

#predictions with the constant of the model
a.coefficients<-c(fit$intercept,fit$coefficients)

#predictions
newZ=(newX-cbind(rep(1,dim(newD)[1]),newD)%*%fit$orthCoef)%*%fit$projection
newY=cbind(rep(1,dim(newD)[1]),newD,newZ)%*%a.coefficients

lsplsGlm documentation built on July 27, 2017, 5:01 p.m.