Nothing
### mvr.R: plsr/pcr modelling functions
###
### The top level user function. Implements a formula interface and calls the
### correct fit function to do the work.
### The function borrows heavily from lm().
#' @name mvr
#' @title Partial Least Squares and Principal Component Regression
#'
#' @description Functions to perform partial least squares regression (PLSR), canonical
#' powered partial least squares (CPPLS) or principal component regression
#' (PCR), with a formula interface. Cross-validation can be used. Prediction,
#' model extraction, plot, print and summary methods exist.
#'
#' @details The functions fit PLSR, CPPLS or PCR models with 1, \eqn{\ldots},
#' \code{ncomp} number of components. Multi-response models are fully
#' supported.
#'
#' The type of model to fit is specified with the \code{method} argument. Four
#' PLSR algorithms are available: the kernel algorithm (\code{"kernelpls"}),
#' the wide kernel algorithm (\code{"widekernelpls"}), SIMPLS (\code{"simpls"})
#' and the classical orthogonal scores algorithm (\code{"oscorespls"}). One
#' CPPLS algorithm is available (\code{"cppls"}) providing several extensions
#' to PLS. One PCR algorithm is available: using the singular value
#' decomposition (\code{"svdpc"}). If \code{method} is \code{"model.frame"},
#' the model frame is returned. The functions \code{pcr}, \code{plsr} and
#' \code{cppls} are wrappers for \code{mvr}, with different values for
#' \code{method}.
#'
#' The \code{formula} argument should be a symbolic formula of the form
#' \code{response ~ terms}, where \code{response} is the name of the response
#' vector or matrix (for multi-response models) and \code{terms} is the name of
#' one or more predictor matrices, usually separated by \code{+}, e.g.,
#' \code{water ~ FTIR} or \code{y ~ X + Z}. See \code{\link{lm}} for a
#' detailed description. The named variables should exist in the supplied
#' \code{data} data frame or in the global environment. Note: Do not use
#' \code{mvr(mydata$y ~ mydata$X, \ldots{})}, instead use \code{mvr(y ~ X, data
#' = mydata, \ldots{})}. Otherwise, \code{\link{predict.mvr}} will not work
#' properly. The chapter \samp{Statistical models in R} of the manual \samp{An
#' Introduction to R} distributed with is a good reference on formulas in .
#'
#' The number of components to fit is specified with the argument \code{ncomp}.
#' It this is not supplied, the maximal number of components is used (taking
#' account of any cross-validation).
#'
#' All implemented algorithms mean-center both predictor and response matrices.
#' This can be turned off by specifying \code{center = FALSE}. See Seasholtz
#' and Kowalski for a discussion about centering in PLS regression.
#'
#' If \code{validation = "CV"}, cross-validation is performed. The number and
#' type of cross-validation segments are specified with the arguments
#' \code{segments} and \code{segment.type}. See \code{\link{mvrCv}} for
#' details. If \code{validation = "LOO"}, leave-one-out cross-validation is
#' performed. It is an error to specify the segments when \code{validation =
#' "LOO"} is specified.
#'
#' By default, the cross-validation will be performed serially. However, it
#' can be done in parallel using functionality in the \code{\link{parallel}}
#' package by setting the option \code{parallel} in \code{\link{pls.options}}.
#' See \code{\link{pls.options}} for the differnt ways to specify the
#' parallelism. See also Examples below.
#'
#' Note that the cross-validation is optimised for speed, and some generality
#' has been sacrificed. Especially, the model matrix is calculated only once
#' for the complete cross-validation, so models like \code{y ~ msc(X)} will not
#' be properly cross-validated. However, scaling requested by \code{scale =
#' TRUE} is properly cross-validated. For proper cross-validation of models
#' where the model matrix must be updated/regenerated for each segment, use the
#' separate function \code{\link{crossval}}.
#'
#' @aliases mvr pcr plsr cppls
#' @param formula a model formula. Most of the \code{\link{lm}} formula
#' constructs are supported. See below.
#' @param ncomp the number of components to include in the model (see below).
#' @param Y.add a vector or matrix of additional responses containing relevant
#' information about the observations. Only used for \code{cppls}.
#' @param data an optional data frame with the data to fit the model from.
#' @param subset an optional vector specifying a subset of observations to be
#' used in the fitting process.
#' @param na.action a function which indicates what should happen when the data
#' contain missing values. The default is set by the \code{na.action} setting
#' of \code{\link{options}}, and is \code{\link{na.fail}} if that is unset.
#' The \sQuote{factory-fresh} default is \code{\link{na.omit}}. Another
#' possible value is \code{NULL}, no action. Value \code{\link{na.exclude}}
#' can be useful. See \code{\link{na.omit}} for other alternatives.
#' @param method the multivariate regression method to be used. If
#' \code{"model.frame"}, the model frame is returned.
#' @param scale numeric vector, or logical. If numeric vector, \eqn{X} is
#' scaled by dividing each variable with the corresponding element of
#' \code{scale}. If \code{scale} is \code{TRUE}, \eqn{X} is scaled by dividing
#' each variable by its sample standard deviation. If cross-validation is
#' selected, scaling by the standard deviation is done for every segment.
#' @param center logical, determines if the \eqn{X} and \eqn{Y} matrices are
#' mean centered or not. Default is to perform mean centering.
#' @param validation character. What kind of (internal) validation to use.
#' See below.
#' @param model a logical. If \code{TRUE}, the model frame is returned.
#' @param x a logical. If \code{TRUE}, the model matrix is returned.
#' @param y a logical. If \code{TRUE}, the response is returned.
#' @param weights a vector of individual weights for the observations. Only
#' used for \code{cppls}. (Optional)
#' @param \dots additional optional arguments, passed to the underlying fit
#' functions, and \code{\link{mvrCv}}.
#'
#' Currently, the fit functions \code{\link{oscorespls.fit}} and
#' \code{\link{widekernelpls.fit}} implement these extra arguments: \describe{
#' \item{tol:}{numeric. Tolerance used for determining convergence.}
#' \item{maxit:}{positive integer. The maximal number of iterations used.} }
#' and \code{\link{cppls.fit}} implements: \describe{ \item{lower:}{a vector of
#' lower limits for power optimisation.} \item{upper:}{a vector of upper limits
#' for power optimisation.} \item{trunc.pow:}{logical. Whether to use an
#' experimental alternative power algorithm.} } \code{\link{mvrCv}} implements
#' several arguments; the following are probably the most useful of them:
#' \describe{ \item{segments:}{the number of segments to use, or a list with
#' segments.} \item{segment.type:}{the type of segments to use.}
#' \item{length.seg:}{Positive integer. The length of the segments to use.}
#' \item{jackknife:}{logical. Whether to perform jackknifing of regression
#' coefficients.} }
#'
#' See the functions' documentation for details.
#' @return If \code{method = "model.frame"}, the model frame is returned.
#' Otherwise, an object of class \code{mvr} is returned. The object contains
#' all components returned by the underlying fit function. In addition, it
#' contains the following components: \item{validation}{if validation was
#' requested, the results of the cross-validation. See \code{\link{mvrCv}} for
#' details.} \item{fit.time}{the elapsed time for the fit. This is used by
#' \code{\link{crossval}} to decide whether to turn on tracing.}
#' \item{na.action}{if observations with missing values were removed,
#' \code{na.action} contains a vector with their indices. The class of this
#' vector is used by functions like \code{fitted} to decide how to treat the
#' observations.} \item{ncomp}{the number of components of the model.}
#' \item{method}{the method used to fit the model. See the argument
#' \code{method} for possible values.} \item{center}{use of centering in the model}
#' \item{scale}{if scaling was requested
#' (with \code{scale}), the scaling used.} \item{call}{the function call.}
#' \item{terms}{the model terms.} \item{model}{if \code{model = TRUE}, the
#' model frame.} \item{x}{if \code{x = TRUE}, the model matrix.} \item{y}{if
#' \code{y = TRUE}, the model response.}
#' @author Ron Wehrens and Bjørn-Helge Mevik
#' @seealso \code{\link{kernelpls.fit}}, \code{\link{widekernelpls.fit}},
#' \code{\link{simpls.fit}}, \code{\link{oscorespls.fit}},
#' \code{\link{cppls.fit}}, \code{\link{svdpc.fit}}, \code{\link{mvrCv}},
#' \code{\link{crossval}}, \code{\link[stats]{loadings}}, \code{\link{scores}},
#' \code{\link{loading.weights}}, \code{\link{coef.mvr}},
#' \code{\link{predict.mvr}}, \code{\link{R2}}, \code{\link{MSEP}},
#' \code{\link{RMSEP}}, \code{\link{plot.mvr}}
#' @references Martens, H., Næs, T. (1989) \emph{Multivariate calibration.}
#' Chichester: Wiley.
#'
#' Seasholtz, M. B. and Kowalski, B. R. (1992) The effect of mean centering on
#' prediction in multivariate calibration. \emph{Journal of Chemometrics},
#' \bold{6}(2), 103--111.
#' @keywords regression multivariate
#' @examples
#'
#' data(yarn)
#' ## Default methods:
#' yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
#' yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
#' yarn.cppls <- cppls(density ~ NIR, 6, data = yarn, validation = "CV")
#'
#' ## Alternative methods:
#' yarn.oscorespls <- mvr(density ~ NIR, 6, data = yarn, validation = "CV",
#' method = "oscorespls")
#' yarn.simpls <- mvr(density ~ NIR, 6, data = yarn, validation = "CV",
#' method = "simpls")
#'
#' \dontrun{
#' ## Parallelised cross-validation, using transient cluster:
#' pls.options(parallel = 4) # use mclapply
#' pls.options(parallel = quote(makeCluster(4, type = "PSOCK"))) # use parLapply
#' ## A new cluster is created and stopped for each cross-validation:
#' yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
#' yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
#'
#' ## Parallelised cross-validation, using persistent cluster:
#' library(parallel)
#' ## This creates the cluster:
#' pls.options(parallel = makeCluster(4, type = "PSOCK"))
#' ## The cluster can be used several times:
#' yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
#' yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
#' ## The cluster should be stopped manually afterwards:
#' stopCluster(pls.options()$parallel)
#'
#' ## Parallelised cross-validation, using persistent MPI cluster:
#' ## This requires the packages snow and Rmpi to be installed
#' library(parallel)
#' ## This creates the cluster:
#' pls.options(parallel = makeCluster(4, type = "MPI"))
#' ## The cluster can be used several times:
#' yarn.pls <- plsr(density ~ NIR, 6, data = yarn, validation = "CV")
#' yarn.pcr <- pcr(density ~ NIR, 6, data = yarn, validation = "CV")
#' ## The cluster should be stopped manually afterwards:
#' stopCluster(pls.options()$parallel)
#' ## It is good practice to call mpi.exit() or mpi.quit() afterwards:
#' mpi.exit()
#' }
#'
#' ## Multi-response models:
#' data(oliveoil)
#' sens.pcr <- pcr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
#' sens.pls <- plsr(sensory ~ chemical, ncomp = 4, scale = TRUE, data = oliveoil)
#'
#' ## Classification
#' # A classification example utilizing additional response information
#' # (Y.add) is found in the cppls.fit manual ('See also' above).
#'
#' @export
mvr <- function(formula, ncomp, Y.add, data, subset, na.action,
method = pls.options()$mvralg, scale = FALSE,
center = TRUE, validation = c("none", "CV", "LOO"),
model = TRUE, x = FALSE, y = FALSE, ...)
{
ret.x <- isTRUE(x) # More useful names
ret.y <- isTRUE(y)
center <- isTRUE(center) # Make sure it is a single logical
## Get the model frame
mf <- match.call(expand.dots = FALSE)
if (!missing(Y.add)) {
## Temporarily add Y.add to the formula
Y.addname <- as.character(substitute(Y.add))
mf$formula <- update(formula, paste("~ . +", Y.addname))
}
m <- match(c("formula", "data", "subset", "na.action"), names(mf), 0)
mf <- mf[c(1, m)] # Retain only the named arguments
mf[[1]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
method <- match.arg(method, c("kernelpls", "widekernelpls", "simpls",
"oscorespls", "cppls", "svdpc", "model.frame"))
if (method == "model.frame") return(mf)
## Get the terms
mt <- attr(mf, "terms") # This is to include the `predvars'
# attribute of the terms
## Get the data matrices
Y <- model.response(mf, "numeric")
if (is.matrix(Y)) {
if (is.null(colnames(Y)))
colnames(Y) <- paste("Y", 1:dim(Y)[2], sep = "")
} else {
Y <- as.matrix(Y)
colnames(Y) <- deparse(formula[[2]])
}
if (missing(Y.add)) {
Y.add <- NULL
} else {
Y.add <- mf[,Y.addname]
## Remove Y.add from the formula again
mt <- drop.terms(mt, which(attr(mt, "term.labels") == Y.addname),
keep.response = TRUE)
}
X <- delete.intercept(model.matrix(mt, mf))
nobj <- dim(X)[1]
npred <- dim(X)[2]
## model.matrix prepends the term name to the colnames of matrices.
## If there is only one predictor term, and the corresponding matrix
## has colnames, remove the prepended term name:
if (length(attr(mt, "term.labels")) == 1 &&
!is.null(colnames(mf[[attr(mt, "term.labels")]])))
colnames(X) <- sub(attr(mt, "term.labels"), "", colnames(X))
## Set or check the number of components:
if (missing(ncomp)) {
ncomp <- min(nobj - 1, npred)
ncompWarn <- FALSE # Don't warn about changed `ncomp'
} else {
if (ncomp < 1 || ncomp > min(nobj - 1, npred))
stop("Invalid number of components, ncomp")
ncompWarn <- TRUE
}
## Handle any fixed scaling before the the validation
sdscale <- isTRUE(scale) # Signals scaling by sd
if (is.numeric(scale))
if (length(scale) == npred)
X <- X / rep(scale, each = nobj)
else stop("length of 'scale' must equal the number of x variables")
## Optionally, perform validation:
switch(match.arg(validation),
CV = {
val <- mvrCv(X, Y, ncomp, Y.add = Y.add, method = method,
scale = sdscale, center = center, ...)
},
LOO = {
segments <- as.list(1:nobj)
attr(segments, "type") <- "leave-one-out"
val <- mvrCv(X, Y, ncomp, Y.add = Y.add, method = method,
scale = sdscale, center = center,
segments = segments, ...)
},
none = {
val <- NULL
}
)
## Check and possibly adjust ncomp:
if (identical(TRUE, ncomp > val$ncomp)) {
ncomp <- val$ncomp
if (ncompWarn) warning("`ncomp' reduced to ", ncomp,
" due to cross-validation")
}
## Select fit function:
fitFunc <- switch(method,
kernelpls = kernelpls.fit,
widekernelpls = widekernelpls.fit,
simpls = simpls.fit,
oscorespls = oscorespls.fit,
cppls = cppls.fit,
svdpc = svdpc.fit)
## Perform any scaling by sd:
if (sdscale) {
## This is faster than sd(X), but cannot handle missing values:
scale <- sqrt(colSums((X - rep(colMeans(X), each = nobj))^2) /
(nobj - 1))
if (any(abs(scale) < .Machine$double.eps^0.5))
warning("Scaling with (near) zero standard deviation")
X <- X / rep(scale, each = nobj)
}
## Fit the model:
start.time <- proc.time()[3]
z <- fitFunc(X, Y, ncomp, Y.add = Y.add, center = center, ...)
z$fit.time <- proc.time()[3] - start.time
## Build and return the object:
class(z) <- "mvr"
z$na.action <- attr(mf, "na.action")
z$ncomp <- ncomp
z$method <- method
z$center <- center
if (is.numeric(scale)) z$scale <- scale
z$validation <- val
z$call <- match.call()
z$terms <- mt
if (isTRUE(model)) z$model <- mf
if (ret.x) z$x <- X
if (ret.y) z$y <- Y
z
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.