Multivariate Partial Least Squares Regression

Share:

Description

The function pls.regression performs pls multivariate regression (with several response variables and several predictor variables) using de Jong's SIMPLS algorithm. This function is an adaptation of R. Wehrens' code from the package pls.pcr.

Usage

1
pls.regression(Xtrain, Ytrain, Xtest=NULL, ncomp=NULL,  unit.weights=TRUE)

Arguments

Xtrain

a (ntrain x p) data matrix of predictors. Xtrain may be a matrix or a data frame. Each row corresponds to an observation and each column to a predictor variable.

Ytrain

a (ntrain x m) data matrix of responses. Ytrain may be a vector (if m=1), a matrix or a data frame. If Ytrain is a matrix or a data frame, each row corresponds to an observation and each column to a response variable. If Ytrain is a vector, it contains the unique response variable for each observation.

Xtest

a (ntest x p) matrix containing the predictors for the test data set. Xtest may also be a vector of length p (corresponding to only one test observation).

ncomp

the number of latent components to be used for regression. If ncomp is a vector of integers, the regression model is built successively with each number of components. If ncomp=NULL, the maximal number of components min(ntrain,p) is chosen.

unit.weights

if TRUE then the latent components will be constructed from weight vectors that are standardized to length 1, otherwise the weight vectors do not have length 1 but the latent components have norm 1.

Details

The columns of the data matrices Xtrain and Ytrain must not be centered to have mean zero, since centering is performed by the function pls.regression as a preliminary step before the SIMPLS algorithm is run.

In the original definition of SIMPLS by de Jong (1993), the weight vectors have length 1. If the weight vectors are standardized to have length 1, they satisfy a simple optimality criterion (de Jong, 1993). However, it is also usual (and computationally efficient) to standardize the latent components to have length 1.

In contrast to the original version found in the package pls.pcr, the prediction for the observations from Xtest is performed after centering the columns of Xtest by substracting the columns means calculated from Xtrain.

Value

A list with the following components:

B

the (p x m x length(ncomp)) matrix containing the regression coefficients. Each row corresponds to a predictor variable and each column to a response variable. The third dimension of the matrix B corresponds to the number of PLS components used to compute the regression coefficients. If ncomp has length 1, B is just a (p x m) matrix.

Ypred

the (ntest x m x length(ncomp)) containing the predicted values of the response variables for the observations from Xtest. The third dimension of the matrix Ypred corresponds to the number of PLS components used to compute the regression coefficients.

P

the (p x max(ncomp)) matrix containing the X-loadings.

Q

the (m x max(ncomp)) matrix containing the Y-loadings.

T

the (ntrain x max(ncomp)) matrix containing the X-scores (latent components)

R

the (p x max(ncomp)) matrix containing the weights used to construct the latent components.

meanX

the p-vector containing the means of the columns of Xtrain.

Author(s)

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html) and Korbinian Strimmer (http://strimmerlab.org/).

Adapted in part from pls.pcr code by R. Wehrens (in a former version of the 'pls' package http://cran.r-project.org/web/packages/pls/index.html).

References

S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.

C. J. F. ter Braak and S. de Jong (1993). The objective function of partial least squares regression, Journal of Chemometrics 12, 41–54.

See Also

pls.lda, TFA.estimate, pls.regression.cv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# load plsgenomics library
library(plsgenomics)

# load the Ecoli data
data(Ecoli)

# perform pls regression
# with unit latent components
pls.regression(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,Xtest=Ecoli$CONNECdata,
			ncomp=1:3,unit.weights=FALSE)

# with unit weight vectors
pls.regression(Xtrain=Ecoli$CONNECdata,Ytrain=Ecoli$GEdata,Xtest=Ecoli$CONNECdata,
			ncomp=1:3,unit.weights=TRUE)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.