Estimates the parameters of a regression model.
1 2 3 4
a brief description of the porject of type character.
a description of the error distribution and link function to be
a symbolic description of the model to be fit.
denotes the regression method by a character string used to fit
a character string which allows for a project title.
additional optional arguments to be passed to the underlying
functions. For details we refer to inspect the following help
regFit was created to provide a selection of
regression models working together with Rmetrics'
objects and providing a common S4 object as the returned value. These
models include linear modeling, robust linear modeling, generalized
linear modeling, generalized additive modelling, projection pursuit
regression, neural networks, and polytochomous MARS models.
LM – Linear Modelling:
Univariate linear regression analysis is a statistical methodology that assumes a linear relationship between some predictor variables and a response variable. The goal is to estimate the coefficients and to predict new data from the estimated linear relationship.
R's base function
lm(formula, data, subset, weights, na.action, method = "qr",
model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
contrasts = NULL, offset, ...)
is used to fit linear models. It can be used to carry out regression,
single stratum analysis of variance and analysis of covariance, although
aov may provide a more convenient interface for these.
regFit(formula, data, use = "lm", ...)
calls R's base function
lm but with the difference that the
data argument, may be any rectangular object which can be
transferred by the function
as.data.frame into a data frame
with named columns, e.g. an object of class
regFit returns an S4 object of class
@fit is the object as returned by the function
"lm". In addtion we have S4 methods
residuals which allow to retrieve the fitted values and the
residuals as objects of same classe as defined by the argument
plot.lm provides four plots: a plot of residuals
against fitted values, a Scale-Location plot of sqrt(| residuals |)
against fitted values, a normal QQ plot, and a plot of Cook's
distances versus row labels.
LM – Robust Linear Modelling:
To fit a linear model by robust regression using an M estimator R offers the function
rlm(formula, data, weights, ..., subset, na.action,
method = c("M", "MM", "model.frame"),
wt.method = c("inv.var", "case"),
model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)
MASS. Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "rlm", ...)
which allows us to use for example S4
timeSeries objects as
input and to get the output as an S4 object with the known slots.
GLM – Generalized Linear Models:
Generalized linear modelling extends the linear model in two directions. (i) with a monotonic differentiable link function describing how the expected values are related to the linear predictor, and (ii) with response variables having a probability distribution from an exponential family.
R's base function from package
stats comes with the function
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart, offset,
control = glm.control(...), model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL, ...)
Again we can use the Rmetrics' wrapper
regFit(formula, data, use = "gam", ...)
GAM – Generalized Additive Models:
An additive model generalizes a linear model by smoothing individually each predictor term. A generalized additive model extends the additive model in the same spirit as the generalized liner amodel extends the linear model, namely for allowing a link function and for allowing non-normal distributions from the exponential family.
PPR – Projection Pursuit Regression:
The basic method is given by Friedman (1984), and is essentially the same code used by S-PLUS's
ppreg. It is observed that
this code is extremely sensitive to the compiler used. The algorithm
first adds up to
max.terms, by default
ridge terms one at a time; it will use less if it is unable to find
a term to add that makes sufficient difference. The levels of
optlevel, by default 2, differ in
how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are. Levels 2 and 3 refit
all the terms; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion. The
plot method plots Ridge functions for the projection pursuit
POLYMARS – Polytochomous MARS:
The algorithm employed by
polymars is different from the
MARS(tm) algorithm of Friedman (1991), though it has many similarities.
Also the name
polymars has been used for this algorithm well
before MARS was trademarked.
NNET – Feedforward Neural Network Regression:
If the response in
formula is a factor, an appropriate
classification network is constructed; this has one output and
entropy fit if the number of levels is two, and a number of
outputs equal to the number of classes and a softmax output
stage for more levels. If the response is not a factor, it is
passed on unchanged to
nnet.default. A quasi-Newton
optimizer is used, written in
returns an S4 object of class
The R core team for the
lm functions from R's
B.R. Ripley for the
glm functions from R's
S.N. Wood for the
gam functions from R's
N.N. for the
ppr functions from R's
M. O' Connors for the
polymars functions from R's
The R core team for the
nnet functions from R's
Diethelm Wuertz for the Rmetrics R-port.
Belsley D.A., Kuh E., Welsch R.E. (1980); Regression Diagnostics; Wiley, New York.
Dobson, A.J. (1990); An Introduction to Generalized Linear Models; Chapman and Hall, London.
Draper N.R., Smith H. (1981); Applied Regression Analysis; Wiley, New York.
Friedman, J.H. (1991); Multivariate Adaptive Regression Splines (with discussion), The Annals of Statistics 19, 1–141.
Friedman J.H., and Stuetzle W. (1981); Projection Pursuit Regression; Journal of the American Statistical Association 76, 817-823.
Friedman J.H. (1984); SMART User's Guide; Laboratory for Computational Statistics, Stanford University Technical Report No. 1.
Green, Silverman (1994); Nonparametric Regression and Generalized Linear Models; Chapman and Hall.
Gu, Wahba (1991); Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method; SIAM J. Sci. Statist. Comput. 12, 383-398.
Hastie T., Tibshirani R. (1990); Generalized Additive Models; Chapman and Hall, London.
Kooperberg Ch., Bose S., and Stone C.J. (1997); Polychotomous Regression, Journal of the American Statistical Association 92, 117–127.
McCullagh P., Nelder, J.A. (1989); Generalized Linear Models; Chapman and Hall, London.
Myers R.H. (1986); Classical and Modern Regression with Applications; Duxbury, Boston.
Rousseeuw P.J., Leroy, A. (1987); Robust Regression and Outlier Detection; Wiley, New York.
Seber G.A.F. (1977); Linear Regression Analysis; Wiley, New York.
Stone C.J., Hansen M., Kooperberg Ch., and Truong Y.K. (1997); The use of polynomial splines and their tensor products in extended linear modeling (with discussion).
Venables, W.N., Ripley, B.D. (1999); Modern Applied Statistics with S-PLUS; Springer, New York.
Wahba (1990); Spline Models of Observational Data; SIAM.
Weisberg S. (1985); Applied Linear Regression; Wiley, New York.
Wood (2000); Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties; JRSSB 62, 413-428.
Wood (2001); mgcv: GAMs and Generalized Ridge Regression for R. R News 1, 20-25.
Wood (2001); Thin Plate Regression Splines.
There exists a vast literature on regression. The references listed above are just a small sample of what is available. The book by Myers' is an introductory text book that covers discussions of much of the recent advances in regression technology. Seber's book is at a higher mathematical level and covers much of the classical theory of least squares.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## regSim - x <- regSim(model = "LM3", n = 100) # LM regFit(Y ~ X1 + X2 + X3, data = x, use = "lm") # RLM regFit(Y ~ X1 + X2 + X3, data = x, use = "rlm") # AM regFit(Y ~ X1 + X2 + X3, data = x, use = "gam") # PPR regFit(Y ~ X1 + X2 + X3, data = x, use = "ppr") # NNET regFit(Y ~ X1 + X2 + X3, data = x, use = "nnet") # POLYMARS regFit(Y ~ X1 + X2 + X3, data = x, use = "polymars")