Partial Least Squares Regression

Share:

Description

Functions to perform partial least squares regression with a formula interface. Bootstraping can be used. Prediction, residuals, model extraction, plot, print and summary methods are also implemented.

Usage

1
2
3
4
5
6
7
plsFit(formula, ncomp, data, subset, na.action, contr = "contr.niets",
        method = "bidiagpls", scale = TRUE, n_cores = 2, 
        validation = c("none", "oob", "loo"), boots = 1000, model = TRUE, 
        x = FALSE, y = FALSE, ...)
  
## S3 method for class 'mvdareg'
summary(object, ncomp = object$ncomp, digits = 3, ...)

Arguments

formula

a model formula (see below).

ncomp

the number of components to include in the model (see below).

data

an optional data frame containing the variables in the model.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

contr

an optional list. See the contrasts.arg of model.matrix.default.

method

the multivariate regression algorithm to be used.

scale

an optional data frame containing the variables in the model.

n_cores

Number of cores to run for parallel processing. Currently set to 2 with the max being 4.

validation

character. What kind of (internal) validation to use. See below.

boots

Number of bootstrap samples when validation = 'oob'

model

an optional data frame containing the variables in the model.

x

a logical. If TRUE, the model matrix is returned.

y

a logical. If TRUE, the response is returned.

object

an object of class "mvdareg", i.e., a fitted model.

digits

the number of decimal place to output with summary.mvdareg

...

additional arguments, passed to the underlying fit functions, and mvdareg. Currently not in use.

Details

The function fits a partial least squares (PLS) model with 1, ..., ncomp number of latent variables. Multi-response models are not supported.

The type of model to fit is specified with the method argument. Currently one PLS algorithm is available: the bigiag2 algorithm ("bigiagpls").

The formula argument should be a symbolic formula of the form response ~ terms, where response is the name of the response vector and terms is the name of one or more predictor matrices, usually separated by +, e.g., y ~ X + Z. See lm for a detailed description. The named variables should exist in the supplied data data frame or in the global environment. The chapter Statistical models in R of the manual An Introduction to R distributed with R is a good reference on formulas in R.

The number of components to fit is specified with the argument ncomp. It this is not supplied, the maximal number of components is used.

If validation = "oob", bootstrap cross-validation is performed. Bootstrap confidence intervals are provided for coefficients, weights, loadings, and y.loadings. The number of bootstrap samples is specified with the argument boots. See mvdaboot for details. If validation = "loo", leave-one-out cross-validation is performed. If validation = "none", no cross-validation is performed.

The argument contr is passed to the default contr.niets; contr.helmert, contr.poly, contr.sum, contr.treatment are also supported.

Value

An object of class mvdareg is returned. The object contains all components returned by the underlying fit function. In addition, it contains the following:

loadings

X loadings

weights

weights

D2.values

bidiag2 matrix

iD2

inverse of bidiag2 matrix

Ymean

mean of reponse variable

Xmeans

mean of predictor variables

coefficients

PLS regression coefficients

y.loadings

y-loadings

scores

X scores

R

orthogonal weights

Y.values

scaled response values

Yactual

actual response values

fitted

fitted values

residuals

residuals

Xdata

X matrix

iPreds

predicted values

y.loadings2

scaled y-loadings

ncomp

number of latent variables

method

PLS algorithm used

scale

scaling used

validation

validation method

call

model call

terms

model terms

model

fitted model

Author(s)

Nelson Lee Afanador (nelson.afanador@mvdalab.com), Thanh Tran (thanh.tran@mvdalab.com)

References

NOTE: This function is adapted from mvr in package pls with extensive modifications by Nelson Lee Afanador and Thanh Tran.

See Also

bidiagpls.fit, mvdaboot, boot.plots, R2s, PE, ap.plot, T2, Xresids, smc, scoresplot, ScoreContrib, sr, loadingsplot, weightsplot, coefsplot, loadingsplot2D, weightsplot2D, vip, bca.cis, coefficients.boots, loadings.boots, weight.boots, coefficients, loadings, weights, BiPlot, jk.after.boot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
###  PLS MODEL FIT WITH validation = 'oob', i.e. bootstrapping ###
data(Penta)
## Number of bootstraps set to 500 to demonstrate flexibility
## Use a minimum of 1000 (default) for results that support bootstraping
mod1 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], 
               ncomp = 2, validation = "oob", boots = 500)
summary(mod1) #Model summary

###  PLS MODEL FIT WITH validation = 'loo', i.e. leave-one-out CV ###

mod2 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], 
               ncomp = 2, validation = "loo")
summary(mod2) #Model summary

###  PLS MODEL FIT WITH validation = 'none', i.e. no CV ###

mod3 <- plsFit(log.RAI ~., scale = TRUE, data = Penta[, -1], 
               ncomp = 2, contr = "contr.niets", validation = "none")
summary(mod3) #Model summary

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.