regress: Switch function for least squares and parsimonious monomvn...
In monomvn: Estimation for MVN and Student-t Data with Monotone Missingness

regress

R Documentation

Switch function for least squares and parsimonious monomvn regressions

Description

This function fits the specified ordinary least squares or parsimonious regression (plsr, pcr, ridge, and lars methods) depending on the arguments provided, and returns estimates of coefficients and (co-)variances in a monomvn friendly format

Usage

regress(X, y, method = c("lsr", "plsr", "pcr", "lasso", "lar",
     "forward.stagewise", "stepwise", "ridge", "factor"), p = 0,
     ncomp.max = Inf, validation = c("CV", "LOO", "Cp"),
     verb = 0, quiet = TRUE)

Arguments

`X`	`data.frame`, `matrix`, or vector of inputs `X`
`y`	matrix of responses `y` of row-length equal to the leading dimension (rows) of `X`, i.e., `nrow(y) == nrow(X)`; if `y` is a vector, then `nrow` may be interpreted as `length`
`method`	describes the type of parsimonious (or shrinkage) regression, or ordinary least squares. From the pls package we have `"plsr"` (plsr, the default) for partial least squares and `"pcr"` (pcr) for standard principal component regression. From the lars package (see the `"type"` argument to lars) we have `"lasso"` for L1-constrained regression, `"lar"` for least angle regression, `"forward.stagewise"` and `"stepwise"` for fast implementations of classical forward selection of covariates. From the MASS package we have `"ridge"` as implemented by the `lm.ridge` function. The `"factor"` method treats the first `p` columns of `y` as known factors
`p`	when performing regressions, `0 <= p <= 1` is the proportion of the number of columns to rows in the design matrix before an alternative regression `method` (except `"lsr"`) is performed as if least-squares regression “failed”. Least-squares regression is known to fail when the number of columns is greater than or equal to the number of rows. The default setting, `p = 0`, forces the specified `method` to be used for every regression unless `method = "lsr"` is specified but is unstable. Intermediate settings of `p` allow the user to specify that least squares regressions are preferred only when there are sufficiently more rows in the design matrix (`X`) than columns. When `method = "factor"` the `p` argument represents an integer (positive) number of initial columns of `y` to treat as known factors
`ncomp.max`	maximal number of (principal) components to consider in a `method`—only meaningful for the `"plsr"` or `"pcr"` methods. Large settings can cause the execution to be slow as they drastically increase the cross-validation (CV) time
`validation`	method for cross validation when applying a parsimonious regression method. The default setting of `"CV"` (randomized 10-fold cross-validation) is the faster method, but does not yield a deterministic result and does not apply for regressions on less than ten responses. `"LOO"` (leave-one-out cross-validation) is deterministic, always applicable, and applied automatically whenever `"CV"` cannot be used. When standard least squares is appropriate, the methods implemented the lars package (e.g. lasso) support model choice via the `"Cp"` statistic, which defaults to the `"CV"` method when least squares fails. This argument is ignored for the `"ridge"` method; see details below
`verb`	whether or not to print progress indicators. The default (`verb = 0`) keeps quiet. This argument is provided for `monomvn` and is not intended to be set by the user via this interface
`quiet`	causes `warning`s about regressions to be silenced when `TRUE`

Details

All methods (except "lsr") require a scheme for estimating the amount of variability explained by increasing numbers of non-zero coefficients (or principal components) in the model. Towards this end, the pls and lars packages support 10-fold cross validation (CV) or leave-one-out (LOO) CV estimates of root mean squared error. See pls and lars for more details. The regress function uses CV in all cases except when nrow(X) <= 10, in which case CV fails and LOO is used. Whenever nrow(X) <= 3 pcr fails, so plsr is used instead. If quiet = FALSE then a warning is given whenever the first choice for a regression fails.

For pls methods, RMSEs are calculated for a number of components in 1:ncomp.max where a NULL value for ncomp.max it is replaced with

ncomp.max <- min(ncomp.max, ncol(y), nrow(X)-1)

which is the max allowed by the pls package.

Simple heuristics are used to select a small number of components (ncomp for pls), or number of coefficients (for lars) which explains a large amount of the variability (RMSE). The lars methods use a “one-standard error rule” outlined in Section 7.10, page 216 of HTF below. The pls package does not currently support the calculation of standard errors for CV estimates of RMSE, so a simple linear penalty for increasing ncomp is used instead. The ridge constant (lambda) for lm.ridge is set using the optimize function on the GCV output.

Value

regress returns a list containing the components listed below.

`call`	a copy of the function call as used
`method`	a copy of the `method` input argument
`ncomp`	depends on the `method` used: is `NA` when `method = "lsr"`; is the number of principal components for `method = "pcr"` and `method = "plsr"`; is the number of non-zero components in the coefficient vector (`$b`, not counting the intercept) for any of the `lars` methods; and gives the chosen `\lambda` penalty parameter for `method = "ridge"`
`lambda`	if `method` is one of `c("lasso", "forward.stagewise", "ridge")`, then this field records the `\lambda` penalty parameter used
`b`	matrix containing the estimated regression coefficients, with `ncol(b) = ncol(y)` and the intercept in the first row
`S`	(biased corrected) maximum likelihood estimate of residual covariance matrix

Note

The CV in plsr and lars are random in nature, and so can be dependent on the random seed. Use validation="LOO" for deterministic (but slower) result

Be warned that the lars implementation of "forward.stagewise" can sometimes get stuck in (what seems like) an infinite loop. This is not a bug in the regress function; the bug has been reported to the authors of lars

Author(s)

Robert B. Gramacy rbg@vt.edu

References

Bjorn-Helge Mevik and Ron Wehrens (2007). The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software 18(2)

Bradley Efron, Trevor Hastie, Ian Johnstone and Robert Tibshirani (2003). Least Angle Regression (with discussion). Annals of Statistics 32(2); see also
https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf

https://bobby.gramacy.com/r_packages/monomvn/

Examples

## following the lars diabetes example
data(diabetes)
attach(diabetes)

## Ordinary Least Squares regression
reg.ols <- regress(x, y)

## Lasso regression
reg.lasso <- regress(x, y, method="lasso")

## partial least squares regression
reg.plsr <- regress(x, y, method="plsr")

## ridge regression
reg.ridge <- regress(x, y, method="ridge")

## compare the coefs
data.frame(ols=reg.ols$b, lasso=reg.lasso$b,
           plsr=reg.plsr$b, ridge=reg.ridge$b)

## summarize the posterior distribution of lambda2 and s2
detach(diabetes)

monomvn documentation built on Sept. 30, 2024, 9:45 a.m.