pls: Partial Least Squares

plsR Documentation

Partial Least Squares

Description

Partial least squares (PLS), also called projection to latent structures, performs multivariate regression between a data matrix and a response matrix by decomposing both matrixes in a way that explains the maximum amount of covariation between them. It is especially useful when the number of predictors is greater than the number of observations, or when the predictors are highly correlated. Orthogonal partial least squares (OPLS) is also provided.

Usage

# NIPALS algorithm
pls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE,
	transpose = FALSE, niter = 100L, tol = 1e-5,
	verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)

# SIMPLS algorithm
pls_simpls(x, y, k = 3L, center = TRUE, scale. = FALSE,
	transpose = FALSE, method = 1L, retscores = TRUE,
	verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)

# Kernel algorithm
pls_kernel(x, y, k = 3L, center = TRUE, scale. = FALSE,
	transpose = FALSE, method = 1L, retscores = TRUE,
	verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)

## S3 method for class 'pls'
fitted(object, type = c("response", "class"), ...)

## S3 method for class 'pls'
predict(object, newdata, k,
	type = c("response", "class"), simplify = TRUE, ...)

# O-PLS algorithm
opls_nipals(x, y, k = 3L, center = TRUE, scale. = FALSE,
	transpose = FALSE, niter = 100L, tol = 1e-9, regression = TRUE,
	verbose = NA, nchunks = NA, BPPARAM = bpparam(), ...)

## S3 method for class 'opls'
coef(object, ...)

## S3 method for class 'opls'
residuals(object, ...)

## S3 method for class 'opls'
fitted(object, type = c("response", "class", "x"), ...)

## S3 method for class 'opls'
predict(object, newdata, k,
	type = c("response", "class", "x"), simplify = TRUE, ...)

# Variable importance in projection
vip(object, type = c("projection", "weights"))

Arguments

x

The data matrix of predictors.

y

The response matrix. (Can also be a factor.)

k

The number of PLS components to use. (Can be a vector for the predict method.)

center

A logical value indicating whether the variables should be shifted to be zero-centered, or a centering vector of length equal to the number of columns of x. The centering is performed implicitly and does not change the out-of-memory data in x.

scale.

A logical value indicating whether the variables should be scaled to have unit variance, or a scaling vector of length equal to the number of columns of x. The scaling is performed implicitly and does not change the out-of-memory data in x.

transpose

A logical value indicating whether x should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for matter_mat and sparse_mat objects, but can be useful for large in-memory (P x N) matrices.

niter

The maximum number of iterations (per component).

tol

The tolerance for convergence (per component).

verbose

Should progress be printed for each iteration?

nchunks

The number of chunks to use (for centering and scaling only).

method

The kernel algorithm to use, where 1 and 2 correspond to the two kernel algorithms described by Dayal and MacGregor (1997). For 1, only of the covariance matrix t(X) %*% Y is computed. For 2, the variance matrix t(X) %*% X is also computed. Typically 1 will be faster if the number of predictors is large. For a smaller number of predictors, 2 will be more efficient.

retscores

Should the scores be computed and returned? This also computes the amount of explained covariance for each component. This is done automatically for NIPALS, but requires additional computation for the kernel algorithms.

regression

For O-PLS, should a 1-component PLS regression be fit to the processed data (for each orthogonal component removed).

...

Not currently used.

BPPARAM

An optional instance of BiocParallelParam. See documentation for bplapply. Currently only used for centering and scaling. Use options(matter.matmul.bpparam=TRUE) to enable parallel matrix multiplication for matter_mat and sparse_mat matrices.

object

An object inheriting from pls or opls.

newdata

An optional data matrix to use for the prediction.

type

The type of prediction, where "response" means the fitted response matrix and "class" will be the vector of class predictions (only valid for discriminant analyses).

simplify

Should the predictions be simplified (from a list) to an array (type="response") or data frame (type="class") when k is a vector?

Details

These functions implement partial least squares (PLS) using the original NIPALS algorithm by Wold et al. (1983), the SIMPLS algorithm by de Jong (1993), or the kernel algorithms by Dayal and MacGregor (1997). A function for calculating orthogonal partial least squares (OPLS) processing using the NIPALS algorithm by Trygg and Wold (2002) is also provided.

Both regression and classification can be performed. If passed a factor, then partial least squares discriminant analysis (PLS-DA) will be performed as described by M. Barker and W. Rayens (2003).

The SIMPLS algorithm (pls_simpls()) is relatively fast as it does not require the deflation of the data matrix. However, the results will differ slightly from the NIPALS and kernel algorithms for multivariate responses. In these cases, only the first component will be identical. The differences are not meaningful in most cases, but it is worth noting.

The kernel algorithms (pls_kernel()) tend to be faster than NIPALS for larger data matrices. The original NIPALS algorithm (pls_nipals()) is the reference implementation. The results from these algorithms are proven to be equivalent for both univariate and multivariate responses.

Note that the NIPALS algorithms cannot handle out-of-memory matter_mat and sparse_mat matrices due to the need to deflate the data matrix for each component. x will be coerced to an in-memory matrix.

Variable importance in projection (VIP) scores proposed by Wold et al. (1993) measure of the influence each variable has on the PLS model. They can be calculated with vip(). Note that non-NIPALS models must have retscores = TRUE for VIP to be calculated. In practice, a VIP score greater than ~1 is a useful criterion for variable selection, although there is no statistical basis for this rule.

Value

An object of class pls, with the following components:

  • coefficients: The regression coefficients.

  • projection: The projection weights of the regression used to calculate the coefficients from the y-loadings or to project the data to the scores.

  • residuals: The residuals from regression.

  • fitted.values: The fitted y matrix.

  • weights: (Optional) The x-weights of the regression.

  • loadings: The x-loadings of the latent variables.

  • scores: (Optional) The x-scores of the latent variables.

  • y.loadings: The y-loadings of the latent variables.

  • y.scores: (Optional) The y-scores of the latent variables.

  • cvar: (Optional) The covariance explained by each component.

Or, an object of class opls, with the following components:

  • weights: The orthogonal x-weights.

  • loadings: The orthogonal x-loadings.

  • scores: The orthogonal x-scores.

  • ratio: The ratio of the orthogonal weights to the PLS loadings for each component. This provides a measure of how much orthogonal variation is being removed by each component and can be interpreted as a scree plot similar to PCA.

  • x: The processed data matrix with orthogonal variation removed.

  • regressions: (Optional.) The PLS 1-component regressions on the processed data.

Author(s)

Kylie A. Bemis

References

S. Wold, H. Martens, and H. Wold. “The multivariate calibration method in chemistry solved by the PLS method.” Proceedings on the Conference on Matrix Pencils, Lecture Notes in Mathematics, Heidelberg, Springer-Verlag, pp. 286 - 293, 1983.

S. de Jong. “SIMPLS: An alternative approach to partial least squares regression.” Chemometrics and Intelligent Laboratory Systems, vol. 18, issue 3, pp. 251 - 263, 1993.

B. S. Dayal and J. F. MacGregor. “Improved PLS algorithms.” Journal of Chemometrics, vol. 11, pp. 73 - 85, 1997.

M. Barker and W. Rayens. “Partial least squares for discrimination.” Journal of Chemometrics, vol. 17, pp. 166-173, 2003.

J. Trygg and S. Wold. “Orthogonal projections to latent structures.” Journal of Chemometrics, vol. 16, issue 3, pp. 119 - 128, 2002.

S. Wold, A. Johansson, and M. Cocchi. “PLS: Partial least squares projections to latent structures.” 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM Science Publishers: Leiden, pp. 523 - 550, 1993.

See Also

prcomp

Examples

register(SerialParam())

x <- cbind(
		c(-2.18, 1.84, -0.48, 0.83),
		c(-2.18, -0.16, 1.52, 0.83))
y <- as.matrix(c(2, 2, 0, -4))

pls_nipals(x, y, k=2)

kuwisdelu/matter documentation built on May 1, 2024, 5:17 a.m.