pls: Partial Least Squares regression

View source: R/pls.R

plsR Documentation

Partial Least Squares regression

Description

pls is used to calibrate, validate and use of partial least squares (PLS) regression model.

Usage

pls(
  x,
  y,
  ncomp = min(nrow(x) - 1, ncol(x), 20),
  center = TRUE,
  scale = FALSE,
  cv = NULL,
  exclcols = NULL,
  exclrows = NULL,
  x.test = NULL,
  y.test = NULL,
  method = "simpls",
  info = "",
  ncomp.selcrit = "min",
  lim.type = "ddmoments",
  alpha = 0.05,
  gamma = 0.01,
  cv.scope = "local"
)

Arguments

x

matrix with predictors.

y

matrix with responses.

ncomp

maximum number of components to calculate.

center

logical, center or not predictors and response values.

scale

logical, scale (standardize) or not predictors and response values.

cv

cross-validation settings (see details).

exclcols

columns of x to be excluded from calculations (numbers, names or vector with logical values)

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values)

x.test

matrix with predictors for test set.

y.test

matrix with responses for test set.

method

algorithm for computing PLS model (only 'simpls' is supported so far)

info

short text with information about the model.

ncomp.selcrit

criterion for selecting optimal number of components ('min' for first local minimum of RMSECV and 'wold' for Wold's rule.)

lim.type

which method to use for calculation of critical limits for residual distances (see details)

alpha

significance level for extreme limits for T2 and Q disances.

gamma

significance level for outlier limits for T2 and Q distances.

cv.scope

scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.

Details

So far only SIMPLS method [1] is available. Implementation works both with one and multiple response variables.

Like in pca, pls uses number of components (ncomp) as a minimum of number of objects - 1, number of x variables and the default or provided value. Regression coefficients, predictions and other results are calculated for each set of components from 1 to ncomp: 1, 1:2, 1:3, etc. The optimal number of components, (ncomp.selected), is found using first local minumum, but can be also forced to user defined value using function (selectCompNum.pls). The selected optimal number of components is used for all default operations - predictions, plots, etc.

Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list("rand", nseg, nrep) for random repeated cross-validation with nseg segments and nrep repetitions or cv = list("ven", nseg) for systematic splits to nseg segments ('venetian blinds').

Calculation of confidence intervals and p-values for regression coefficients can by done based on Jack-Knifing resampling. This is done automatically if cross-validation is used. However it is recommended to use at least 10 segments for stable JK result. See help for regcoeffs objects for more details.

Value

Returns an object of pls class with following fields:

ncomp

number of components included to the model.

ncomp.selected

selected (optimal) number of components.

xcenter

vector with values used to center the predictors (x).

ycenter

vector with values used to center the responses (y).

xscale

vector with values used to scale the predictors (x).

yscale

vector with values used to scale the responses (y).

xloadings

matrix with loading values for x decomposition.

yloadings

matrix with loading values for y decomposition.

xeigenvals

vector with eigenvalues of components (variance of x-scores).

yeigenvals

vector with eigenvalues of components (variance of y-scores).

weights

matrix with PLS weights.

coeffs

object of class regcoeffs with regression coefficients calculated for each component.

info

information about the model, provided by user when build the model.

cv

information cross-validation method used (if any).

res

a list with result objects (e.g. calibration, cv, etc.)

Author(s)

Sergey Kucheryavskiy (svkucheryavski@gmail.com)

References

1. S. de Jong, Chemometrics and Intelligent Laboratory Systems 18 (1993) 251-263. 2. Tarja Rajalahti et al. Chemometrics and Laboratory Systems, 95 (2009), 35-48. 3. Il-Gyo Chong, Chi-Hyuck Jun. Chemometrics and Laboratory Systems, 78 (2005), 103-112.

See Also

Main methods for pls objects:

print prints information about a pls object.
summary.pls shows performance statistics for the model.
plot.pls shows plot overview of the model.
pls.simpls implementation of SIMPLS algorithm.
predict.pls applies PLS model to a new data.
selectCompNum.pls set number of optimal components in the model.
setDistanceLimits.pls allows to change parameters for critical limits.
categorize.pls categorize data rows similar to categorize.pca.
selratio computes matrix with selectivity ratio values.
vipscores computes matrix with VIP scores values.

Plotting methods for pls objects:

plotXScores.pls shows scores plot for x decomposition.
plotXYScores.pls shows scores plot for x and y decomposition.
plotXLoadings.pls shows loadings plot for x decomposition.
plotXYLoadings.pls shows loadings plot for x and y decomposition.
plotXVariance.pls shows explained variance plot for x decomposition.
plotYVariance.pls shows explained variance plot for y decomposition.
plotXCumVariance.pls shows cumulative explained variance plot for y decomposition.
plotYCumVariance.pls shows cumulative explained variance plot for y decomposition.
plotXResiduals.pls shows distance/residuals plot for x decomposition.
plotXYResiduals.pls shows joint distance plot for x and y decomposition.
plotWeights.pls shows plot with weights.
plotSelectivityRatio.pls shows plot with selectivity ratio values.
plotVIPScores.pls shows plot with VIP scores values.

Methods inherited from regmodel object (parent class for pls):

plotPredictions.regmodel shows predicted vs. measured plot.
plotRMSE.regmodel shows RMSE plot.
plotRMSERatio.regmodel shows plot for ratio RMSECV/RMSEC values.
plotYResiduals.regmodel shows residuals plot for y values.
getRegcoeffs.regmodel returns matrix with regression coefficients.

Most of the methods for plotting data (except loadings and regression coefficients) are also available for PLS results (plsres) objects. There is also a randomization test for PLS-regression (randtest) and implementation of interval PLS algorithm for variable selection (ipls)

Examples

### Examples of using PLS model class
library(mdatools)

## 1. Make a PLS model for concentration of first component
## using full-cross validation and automatic detection of
## optimal number of components and show an overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]

model = pls(x, y, ncomp = 8, cv = 1)
summary(model)
plot(model)

## 2. Make a PLS model for concentration of first component
## using test set and 10 segment cross-validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 8, cv = 10, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 3. Make a PLS model for concentration of first component
## using only test set validation and show overview

data(simdata)
x = simdata$spectra.c
y = simdata$conc.c[, 1]
x.t = simdata$spectra.t
y.t = simdata$conc.t[, 1]

model = pls(x, y, ncomp = 6, x.test = x.t, y.test = y.t)
model = selectCompNum(model, 2)
summary(model)
plot(model)

## 4. Show variance and error plots for a PLS model
par(mfrow = c(2, 2))
plotXCumVariance(model, type = 'h')
plotYCumVariance(model, type = 'b', show.labels = TRUE, legend.position = 'bottomright')
plotRMSE(model)
plotRMSE(model, type = 'h', show.labels = TRUE)
par(mfrow = c(1, 1))

## 5. Show scores plots for a PLS model
par(mfrow = c(2, 2))
plotXScores(model)
plotXScores(model, comp = c(1, 3), show.labels = TRUE)
plotXYScores(model)
plotXYScores(model, comp = 2, show.labels = TRUE)
par(mfrow = c(1, 1))

## 6. Show loadings and coefficients plots for a PLS model
par(mfrow = c(2, 2))
plotXLoadings(model)
plotXLoadings(model, comp = c(1, 2), type = 'l')
plotXYLoadings(model, comp = c(1, 2), legend.position = 'topleft')
plotRegcoeffs(model)
par(mfrow = c(1, 1))

## 7. Show predictions and residuals plots for a PLS model
par(mfrow = c(2, 2))
plotXResiduals(model, show.label = TRUE)
plotYResiduals(model, show.label = TRUE)
plotPredictions(model)
plotPredictions(model, ncomp = 4, xlab = 'C, reference', ylab = 'C, predictions')
par(mfrow = c(1, 1))

## 8. Selectivity ratio and VIP scores plots
par(mfrow = c(2, 2))
plotSelectivityRatio(model)
plotSelectivityRatio(model, ncomp = 1)
par(mfrow = c(1, 1))

## 9. Variable selection with selectivity ratio
selratio = getSelectivityRatio(model)
selvar = !(selratio < 8)

xsel = x[, selvar]
modelsel = pls(xsel, y, ncomp = 6, cv = 1)
modelsel = selectCompNum(modelsel, 3)

summary(model)
summary(modelsel)

## 10. Calculate average spectrum and show the selected variables
i = 1:ncol(x)
ms = apply(x, 2, mean)

par(mfrow = c(2, 2))

plot(i, ms, type = 'p', pch = 16, col = 'red', main = 'Original variables')
plotPredictions(model)

plot(i, ms, type = 'p', pch = 16, col = 'lightgray', main = 'Selected variables')
points(i[selvar], ms[selvar], col = 'red', pch = 16)
plotPredictions(modelsel)

par(mfrow = c(1, 1))


mdatools documentation built on Sept. 11, 2024, 7:59 p.m.