Regression Modelling

Description

Estimates the parameters of a regression model.

Usage

1
2
3
4
    
regFit(formula, data, family = gaussian, 
    use = c("lm", "rlm", "glm","gam", "ppr", "nnet", "polymars"), 
    title = NULL, description = NULL, ...)

Arguments

data

data is the data frame containing the variables in the model. By default the variables are taken from environment(formula), typically the environment from which lm is called.

description

a brief description of the porject of type character.

family

a description of the error distribution and link function to be used in glm and gam models. See glm and family for more details.

formula

a symbolic description of the model to be fit.
A typical glm predictor has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a (linear) predictor for response. For binomial models the response can also be specified as a factor.
A gam formula, see also gam.models, allows that smooth terms can be added to the right hand side of the formula. See gam.side.conditions for details and examples.

use

denotes the regression method by a character string used to fit the model. method must be one of the strings in the default argument.
"LM", for linear regression models,
"GLM" for generalized linear modelling,
"GAM" for generalized additive modelling,
"PPR" for projection pursuit regression,
"POLYMARS" for molytochomous MARS, and
"NNET" for feedforward neural network modelling.

title

a character string which allows for a project title.

...

additional optional arguments to be passed to the underlying functions. For details we refer to inspect the following help pages: lm, glm, gam, ppr, polymars, or nnet.

Details

The function regFit was created to provide a selection of regression models working together with Rmetrics' "timeSeries" objects and providing a common S4 object as the returned value. These models include linear modeling, robust linear modeling, generalized linear modeling, generalized additive modelling, projection pursuit regression, neural networks, and polytochomous MARS models.

LM – Linear Modelling:

Univariate linear regression analysis is a statistical methodology that assumes a linear relationship between some predictor variables and a response variable. The goal is to estimate the coefficients and to predict new data from the estimated linear relationship.

R's base function

lm(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)

is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance, although aov may provide a more convenient interface for these.

Rmetrics' function

regFit(formula, data, use = "lm", ...)

calls R's base function lm but with the difference that the data argument, may be any rectangular object which can be transferred by the function as.data.frame into a data frame with named columns, e.g. an object of class "timeSeries". The function regFit returns an S4 object of class "fREG" whose slot @fit is the object as returned by the function "lm". In addtion we have S4 methods fitted and residuals which allow to retrieve the fitted values and the residuals as objects of same classe as defined by the argument data.

The function plot.lm provides four plots: a plot of residuals against fitted values, a Scale-Location plot of sqrt(| residuals |) against fitted values, a normal QQ plot, and a plot of Cook's distances versus row labels.
[stats:lm]

LM – Robust Linear Modelling:

To fit a linear model by robust regression using an M estimator R offers the function

rlm(formula, data, weights, ..., subset, na.action,
method = c("M", "MM", "model.frame"),
wt.method = c("inv.var", "case"),
model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)

from package MASS. Again we can use the Rmetrics' wrapper

regFit(formula, data, use = "rlm", ...)

which allows us to use for example S4 timeSeries objects as input and to get the output as an S4 object with the known slots.
[MASS::rlm]

GLM – Generalized Linear Models:

Generalized linear modelling extends the linear model in two directions. (i) with a monotonic differentiable link function describing how the expected values are related to the linear predictor, and (ii) with response variables having a probability distribution from an exponential family.

R's base function from package stats comes with the function

glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = glm.control(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)

Again we can use the Rmetrics' wrapper

regFit(formula, data, use = "gam", ...)

[stats::glm]

GAM – Generalized Additive Models:

An additive model generalizes a linear model by smoothing individually each predictor term. A generalized additive model extends the additive model in the same spirit as the generalized liner amodel extends the linear model, namely for allowing a link function and for allowing non-normal distributions from the exponential family.
[mgcv:gam]

PPR – Projection Pursuit Regression:

The basic method is given by Friedman (1984), and is essentially the same code used by S-PLUS's ppreg. It is observed that this code is extremely sensitive to the compiler used. The algorithm first adds up to max.terms, by default ppr.nterms, ridge terms one at a time; it will use less if it is unable to find a term to add that makes sufficient difference. The levels of optimization, argument optlevel, by default 2, differ in how thoroughly the models are refitted during this process. At level 0 the existing ridge terms are not refitted. At level 1 the projection directions are not refitted, but the ridge functions and the regression coefficients are. Levels 2 and 3 refit all the terms; level 3 is more careful to re-balance the contributions from each regressor at each step and so is a little less likely to converge to a saddle point of the sum of squares criterion. The plot method plots Ridge functions for the projection pursuit regression fit.
[stats:ppr]

POLYMARS – Polytochomous MARS:

The algorithm employed by polymars is different from the MARS(tm) algorithm of Friedman (1991), though it has many similarities. Also the name polymars has been used for this algorithm well before MARS was trademarked.
[polyclass:polymars]

NNET – Feedforward Neural Network Regression:

If the response in formula is a factor, an appropriate classification network is constructed; this has one output and entropy fit if the number of levels is two, and a number of outputs equal to the number of classes and a softmax output stage for more levels. If the response is not a factor, it is passed on unchanged to nnet.default. A quasi-Newton optimizer is used, written in C.
[nnet:nnet]

Value

returns an S4 object of class "fREG".

Author(s)

The R core team for the lm functions from R's base package,
B.R. Ripley for the glm functions from R's base package,
S.N. Wood for the gam functions from R's mgcv package,
N.N. for the ppr functions from R's modreg package,
M. O' Connors for the polymars functions from R's ? package,
The R core team for the nnet functions from R's nnet package,
Diethelm Wuertz for the Rmetrics R-port.

References

Belsley D.A., Kuh E., Welsch R.E. (1980); Regression Diagnostics; Wiley, New York.

Dobson, A.J. (1990); An Introduction to Generalized Linear Models; Chapman and Hall, London.

Draper N.R., Smith H. (1981); Applied Regression Analysis; Wiley, New York.

Friedman, J.H. (1991); Multivariate Adaptive Regression Splines (with discussion), The Annals of Statistics 19, 1–141.

Friedman J.H., and Stuetzle W. (1981); Projection Pursuit Regression; Journal of the American Statistical Association 76, 817-823.

Friedman J.H. (1984); SMART User's Guide; Laboratory for Computational Statistics, Stanford University Technical Report No. 1.

Green, Silverman (1994); Nonparametric Regression and Generalized Linear Models; Chapman and Hall.

Gu, Wahba (1991); Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method; SIAM J. Sci. Statist. Comput. 12, 383-398.

Hastie T., Tibshirani R. (1990); Generalized Additive Models; Chapman and Hall, London.

Kooperberg Ch., Bose S., and Stone C.J. (1997); Polychotomous Regression, Journal of the American Statistical Association 92, 117–127.

McCullagh P., Nelder, J.A. (1989); Generalized Linear Models; Chapman and Hall, London.

Myers R.H. (1986); Classical and Modern Regression with Applications; Duxbury, Boston.

Rousseeuw P.J., Leroy, A. (1987); Robust Regression and Outlier Detection; Wiley, New York.

Seber G.A.F. (1977); Linear Regression Analysis; Wiley, New York.

Stone C.J., Hansen M., Kooperberg Ch., and Truong Y.K. (1997); The use of polynomial splines and their tensor products in extended linear modeling (with discussion).

Venables, W.N., Ripley, B.D. (1999); Modern Applied Statistics with S-PLUS; Springer, New York.

Wahba (1990); Spline Models of Observational Data; SIAM.

Weisberg S. (1985); Applied Linear Regression; Wiley, New York.

Wood (2000); Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties; JRSSB 62, 413-428.

Wood (2001); mgcv: GAMs and Generalized Ridge Regression for R. R News 1, 20-25.

Wood (2001); Thin Plate Regression Splines.

There exists a vast literature on regression. The references listed above are just a small sample of what is available. The book by Myers' is an introductory text book that covers discussions of much of the recent advances in regression technology. Seber's book is at a higher mathematical level and covers much of the classical theory of least squares.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## regSim -
   x <- regSim(model = "LM3", n = 100)
  
   # LM       
   regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") 
   
   # RLM      
   regFit(Y ~ X1 + X2 + X3, data = x, use = "rlm") 
   # AM       
   regFit(Y ~ X1 + X2 + X3, data = x, use = "gam")                
   # PPR      
   regFit(Y ~ X1 + X2 + X3, data = x, use = "ppr") 
   # NNET     
   regFit(Y ~ X1 + X2 + X3, data = x, use = "nnet") 
   
   # POLYMARS
   regFit(Y ~ X1 + X2 + X3, data = x, use = "polymars")