mvr  R Documentation 
Functions to perform partial least squares regression (PLSR), canonical powered partial least squares (CPPLS) or principal component regression (PCR), with a formula interface. Crossvalidation can be used. Prediction, model extraction, plot, print and summary methods exist.
mvr(
formula,
ncomp,
Y.add,
data,
subset,
na.action,
method = pls.options()$mvralg,
scale = FALSE,
center = TRUE,
validation = c("none", "CV", "LOO"),
model = TRUE,
x = FALSE,
y = FALSE,
...
)
plsr(..., method = pls.options()$plsralg)
pcr(..., method = pls.options()$pcralg)
cppls(..., Y.add, weights, method = pls.options()$cpplsalg)
formula 
a model formula. Most of the 
ncomp 
the number of components to include in the model (see below). 
Y.add 
a vector or matrix of additional responses containing relevant
information about the observations. Only used for 
data 
an optional data frame with the data to fit the model from. 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
na.action 
a function which indicates what should happen when the data
contain missing values. The default is set by the 
method 
the multivariate regression method to be used. If

scale 
numeric vector, or logical. If numeric vector, 
center 
logical, determines if the 
validation 
character. What kind of (internal) validation to use. See below. 
model 
a logical. If 
x 
a logical. If 
y 
a logical. If 
... 
additional optional arguments, passed to the underlying fit
functions, and Currently, the fit functions
and
See the functions' documentation for details. 
weights 
a vector of individual weights for the observations. Only
used for 
The functions fit PLSR, CPPLS or PCR models with 1, \ldots
,
ncomp
number of components. Multiresponse models are fully
supported.
The type of model to fit is specified with the method
argument. Four
PLSR algorithms are available: the kernel algorithm ("kernelpls"
),
the wide kernel algorithm ("widekernelpls"
), SIMPLS ("simpls"
)
and the classical orthogonal scores algorithm ("oscorespls"
). One
CPPLS algorithm is available ("cppls"
) providing several extensions
to PLS. One PCR algorithm is available: using the singular value
decomposition ("svdpc"
). If method
is "model.frame"
,
the model frame is returned. The functions pcr
, plsr
and
cppls
are wrappers for mvr
, with different values for
method
.
The formula
argument should be a symbolic formula of the form
response ~ terms
, where response
is the name of the response
vector or matrix (for multiresponse models) and terms
is the name of
one or more predictor matrices, usually separated by +
, e.g.,
water ~ FTIR
or y ~ X + Z
. See lm
for a
detailed description. The named variables should exist in the supplied
data
data frame or in the global environment. Note: Do not use
mvr(mydata$y ~ mydata$X, ...{})
, instead use mvr(y ~ X, data
= mydata, ...{})
. Otherwise, predict.mvr
will not work
properly. The chapter ‘Statistical models in R’ of the manual ‘An
Introduction to R’ distributed with is a good reference on formulas in .
The number of components to fit is specified with the argument ncomp
.
It this is not supplied, the maximal number of components is used (taking
account of any crossvalidation).
All implemented algorithms meancenter both predictor and response matrices.
This can be turned off by specifying center = FALSE
. See Seasholtz
and Kowalski for a discussion about centering in PLS regression.
If validation = "CV"
, crossvalidation is performed. The number and
type of crossvalidation segments are specified with the arguments
segments
and segment.type
. See mvrCv
for
details. If validation = "LOO"
, leaveoneout crossvalidation is
performed. It is an error to specify the segments when validation =
"LOO"
is specified.
By default, the crossvalidation will be performed serially. However, it
can be done in parallel using functionality in the parallel
package by setting the option parallel
in pls.options
.
See pls.options
for the differnt ways to specify the
parallelism. See also Examples below.
Note that the crossvalidation is optimised for speed, and some generality
has been sacrificed. Especially, the model matrix is calculated only once
for the complete crossvalidation, so models like y ~ msc(X)
will not
be properly crossvalidated. However, scaling requested by scale =
TRUE
is properly crossvalidated. For proper crossvalidation of models
where the model matrix must be updated/regenerated for each segment, use the
separate function crossval
.
If method = "model.frame"
, the model frame is returned.
Otherwise, an object of class mvr
is returned. The object contains
all components returned by the underlying fit function. In addition, it
contains the following components:
validation 
if validation was
requested, the results of the crossvalidation. See 
fit.time 
the elapsed time for the fit. This is used by

na.action 
if observations with missing values were removed,

ncomp 
the number of components of the model. 
method 
the method used to fit the model. See the argument

center 
use of centering in the model 
scale 
if scaling was requested
(with 
call 
the function call. 
terms 
the model terms. 
model 
if 
x 
if 
y 
if

Ron Wehrens and BjørnHelge Mevik
Martens, H., Næs, T. (1989) Multivariate calibration. Chichester: Wiley.
Seasholtz, M. B. and Kowalski, B. R. (1992) The effect of mean centering on prediction in multivariate calibration. Journal of Chemometrics, 6(2), 103–111.
kernelpls.fit
, widekernelpls.fit
,
simpls.fit
, oscorespls.fit
,
cppls.fit
, svdpc.fit
, mvrCv
,
crossval
, loadings
, scores
,
loading.weights
, coef.mvr
,
predict.mvr
, R2
, MSEP
,
RMSEP
, plot.mvr
data(yarn)
## Default methods:
yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.cppls < cppls(density ~ NIR, 6, data = yarn, validation = "CV")
## Alternative methods:
yarn.oscorespls < mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "oscorespls")
yarn.simpls < mvr(density ~ NIR, 6, data = yarn, validation = "CV",
method = "simpls")
## Not run:
## Parallelised crossvalidation, using transient cluster:
pls.options(parallel = 4) # use mclapply
pls.options(parallel = quote(makeCluster(4, type = "PSOCK"))) # use parLapply
## A new cluster is created and stopped for each crossvalidation:
yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## Parallelised crossvalidation, using persistent cluster:
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "PSOCK"))
## The cluster can be used several times:
yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## Parallelised crossvalidation, using persistent MPI cluster:
## This requires the packages snow and Rmpi to be installed
library(parallel)
## This creates the cluster:
pls.options(parallel = makeCluster(4, type = "MPI"))
## The cluster can be used several times:
yarn.pls < plsr(density ~ NIR, 6, data = yarn, validation = "CV")
yarn.pcr < pcr(density ~ NIR, 6, data = yarn, validation = "CV")
## The cluster should be stopped manually afterwards:
stopCluster(pls.options()$parallel)
## It is good practice to call mpi.exit() or mpi.quit() afterwards:
mpi.exit()
## End(Not run)
## Multiresponse models:
data(oliveoil)
sens.pcr < pcr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
sens.pls < plsr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
## Classification
# A classification example utilizing additional response information
# (Y.add) is found in the cppls.fit manual ('See also' above).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.