lspls: Fit LS-PLS Models
In bhmevik/lspls: LS-PLS Models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

A function to fit LS-PLS (least squares–partial least squares) models.

1	lspls(formula, ncomp, data, subset, na.action, model = TRUE, ...)

`formula`	model formula. See Details.
`ncomp`	list or vector of positive integers, giving the number of components to use for each ‘pls-matrix’. See Details.
`data`	an optional data frame with the data to fit the model from.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain missing values.
`model`	logical. If `TRUE`, the model frame is returned.
`...`	additional arguments, passed to the underlying PLSR fit function.

lspls fits LS-PLS models, in which matrices are added successively to the model. The first matrix is fit with ordinary least squares (LS) regression. The rest of the matrices are fit with partial least squares regression (PLSR), using the residuals from the preceeding model as response. See lspls-package or the references for more details, and lspls-package for typical usage.

The model formula is specified as resp ~ term1 + term2 + .... If resp is a matrix (with more than one coloumn), a multi-response model is fitted. term1 specifies the first matrix to be fitted, using LS. Each of the remaining terms will be added sequentially in the order specified in the formula (from left to right). Each term can either be a single matrix, which will be added by itself, or several matrices separated with :, e.g., Z:V:W, which will be added simultaneously (these will be denoted parallell matrices).

The first matrix, term1, is called the LS matrix, and the rest of the predictor matrices (whether parallell or not) are called PLS matrices.

Note that an intercept is not automatically added to the model. It should be included as a constant coloumn in the LS matrix, if desired. (If no intercept is included, the PLS matrices should be centered. This happens automatically if the LS matrix includes the intercept.)

The number of components to use in each of the PLSR models is specified with the ncomp argument, which should be a list. Each element of the list gives the number of components to use for the corresponding term in the formula. If the term specifies parallell matrices (separated with :), the list element should be a vector with one integer for each matrix. Otherwise, it should be a number.

To simplify the specification of ncomp, the following conversions are made: if ncomp is a vector, it will be converted to a list. ncomp will also be recycled as neccessary to get one element for each term. Finally, for a parallell term, the list element will be recycled as needed. Thus, ncomp = 4 will result in 4 components being fit for every PLS matrix.

Currently, the function lspls itself handles the formula and the data, and calls the underlying fit function orthlspls.fit to do the actual fitting. This implements the orthogonalized version of the LS-PLS algorithm, and without splitting of parallell matrices into common and unique components (see the references). Extensions to non-orthogonalized algorithms, and splitting of parallell matrices are planned.

An object of class "lspls". The object contains all components returned by the underlying fit function (currently orthlspls.fit). In addition, it contains the following components:

`fitted.values`	matrix with fitted values, one coloumn per response
`na.action`	if observations with missing values were removed, `na.action` contains a vector with their indices.
`ncomp`	the list of number of components used in the model.
`call`	the function call.
`terms`	the model terms.
`model`	if `model = TRUE`, the model frame.

The user interface (e.g. the model handling) is experimental, and might well change in later versions.

The handling of formula (especially :) is non-standard. Note that the order of the terms is significant; terms are added from left to right.

Bjørn-Helge Mevik

Jørgensen, K., Segtnan, V. H., Thyholt, K., Næs, T. (2004) A Comparison of Methods for Analysing Regression Models with Both Spectral and Designed Variables. Journal of Chemometrics, 18(10), 451–464.

Jørgensen, K., Mevik, B.-H., Næs, T. Combining Designed Experiments with Several Blocks of Spectroscopic Data. (Submitted)

Mevik, B.-H., Jørgensen, K., Måge, I., Næs, T. LS-PLS: Combining Categorical Design Variables with Blocks of Spectroscopic Measurements. (Submitted)

lspls-package, lsplsCv, plot.lspls