Generalized additive models with integrated smoothness estimation
Description
Fits a generalized additive model (GAM) to
data, the term ‘GAM’ being taken to include any quadratically penalized GLM and a variety of
other models estimated by a quadratically penalised likelihood type approach (see family.mgcv
).
The degree of smoothness of model terms is estimated as part of
fitting. gam
can also fit any GLM subject to multiple quadratic penalties (including
estimation of degree of penalization). Confidence/credible intervals are readily
available for any quantity predicted using a fitted model.
Smooth terms are represented using penalized regression splines (or similar smoothers)
with smoothing parameters selected by GCV/UBRE/AIC/REML or by regression splines with
fixed degrees of freedom (mixtures of the two are permitted). Multidimensional smooths are
available using penalized thin plate regression splines (isotropic) or tensor product splines
(when an isotropic smooth is inappropriate), and users can add smooths.
Linear functionals of smooths can also be included in models.
For an overview of the smooths available see smooth.terms
.
For more on specifying models see gam.models
, random.effects
and linear.functional.terms
. For more on model selection see gam.selection
. Do read gam.check
and choose.k
.
See gam from package gam
, for GAMs via the original Hastie and Tibshirani approach (see details for differences to this implementation).
For very large datasets see bam
, for mixed GAM see gamm
and random.effects
.
Usage
1 2 3 4 5 6  gam(formula,family=gaussian(),data=list(),weights=NULL,subset=NULL,
na.action,offset=NULL,method="GCV.Cp",
optimizer=c("outer","newton"),control=list(),scale=0,
select=FALSE,knots=NULL,sp=NULL,min.sp=NULL,H=NULL,gamma=1,
fit=TRUE,paraPen=NULL,G=NULL,in.out,drop.unused.levels=TRUE,
drop.intercept=NULL,...)

Arguments
formula 
A GAM formula, or a list of formulae (see 
family 
This is a family object specifying the distribution and link to use in
fitting etc (see 
data 
A data frame or list containing the model response variable and
covariates required by the formula. By default the variables are taken
from 
weights 
prior weights on the contribution of the data to the log likelihood. Note that a weight of 2, for example,
is equivalent to having made exactly the same observation twice. If you want to reweight the contributions
of each datum without changing the overall magnitude of the log likelihood, then you should normalize the weights
(e.g. 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
na.action 
a function which indicates what should happen when the data contain ‘NA’s. The default is set by the ‘na.action’ setting of ‘options’, and is ‘na.fail’ if that is unset. The “factoryfresh” default is ‘na.omit’. 
offset 
Can be used to supply a model offset for use in fitting. Note
that this offset will always be completely ignored when predicting, unlike an offset
included in 
control 
A list of fit control parameters to replace defaults returned by

method 
The smoothing parameter estimation method. 
optimizer 
An array specifying the numerical optimization method to use to optimize the smoothing
parameter estimation criterion (given by 
scale 
If this is positive then it is taken as the known scale parameter. Negative signals that the scale parameter is unknown. 0 signals that the scale parameter is 1 for Poisson and binomial and unknown otherwise. Note that (RE)ML methods can only work with scale parameter 1 for the Poisson and binomial cases. 
select 
If this is 
knots 
this is an optional list containing user specified knot values to be used for basis construction.
For most bases the user simply supplies the knots to be used, which must match up with the 
sp 
A vector of smoothing parameters can be provided here.
Smoothing parameters must be supplied in the order that the smooth terms appear in the model
formula. Negative elements indicate that the parameter should be estimated, and hence a mixture
of fixed and estimated parameters is possible. If smooths share smoothing parameters then 
min.sp 
Lower bounds can be supplied for the smoothing parameters. Note
that if this option is used then the smoothing parameters 
H 
A user supplied fixed quadratic penalty on the parameters of the GAM can be supplied, with this as its coefficient matrix. A common use of this term is to add a ridge penalty to the parameters of the GAM in circumstances in which the model is close to unidentifiable on the scale of the linear predictor, but perfectly well defined on the response scale. 
gamma 
It is sometimes useful to inflate the model degrees of freedom in the GCV or UBRE/AIC score by a constant multiplier. This allows such a multiplier to be supplied. 
fit 
If this argument is 
paraPen 
optional list specifying any penalties to be applied to parametric model terms.

G 
Usually 
in.out 
optional list for initializing outer iteration. If supplied
then this must contain two elements: 
drop.unused.levels 
by default unused levels are dropped from factors before fitting. For some smooths involving factor variables you might want to turn this off. Only do so if you know what you are doing. 
drop.intercept 
Set to 
... 
further arguments for
passing on e.g. to 
Details
A generalized additive model (GAM) is a generalized linear model (GLM) in which the linear predictor is given by a user specified sum of smooth functions of the covariates plus a conventional parametric component of the linear predictor. A simple example is:
log(E(y_i))=f_1(x_1i)+f_2(x_2i)
where the (independent) response variables y_i~Poi, and f_1 and f_2 are smooth functions of covariates x_1 and x_2. The log is an example of a link function.
If absolutely any smooth functions were allowed in model fitting then maximum likelihood estimation of such models would invariably result in complex overfitting estimates of f_1 and f_2. For this reason the models are usually fit by penalized likelihood maximization, in which the model (negative log) likelihood is modified by the addition of a penalty for each smooth function, penalizing its ‘wiggliness’. To control the tradeoff between penalizing wiggliness and penalizing badness of fit each penalty is multiplied by an associated smoothing parameter: how to estimate these parameters, and how to practically represent the smooth functions are the main statistical questions introduced by moving from GLMs to GAMs.
The mgcv
implementation of gam
represents the smooth functions using
penalized regression splines, and by default uses basis functions for these splines that
are designed to be optimal, given the number basis functions used. The smooth terms can be
functions of any number of covariates and the user has some control over how smoothness of
the functions is measured.
gam
in mgcv
solves the smoothing parameter estimation problem by using the
Generalized Cross Validation (GCV) criterion
n D/(n  DoF)^2
or an UnBiased Risk Estimator (UBRE )criterion
D/n + 2 s DoF / n s
where D is the deviance, n the number of data, s the scale parameter and DoF the effective degrees of freedom of the model. Notice that UBRE is effectively just AIC rescaled, but is only used when s is known.
Alternatives are GACV, or a Laplace approximation to REML. There
is some evidence that the latter may actually be the most effective choice.
The main computational challenge solved
by the mgcv
package is to optimize the smoothness selection criteria efficiently and reliably. Various
alternative numerical methods are provided which can be set by argument optimizer
.
Broadly gam
works by first constructing basis functions and one or more quadratic penalty
coefficient matrices for each smooth term in the model formula, obtaining a model matrix for
the strictly parametric part of the model formula, and combining these to obtain a
complete model matrix (/design matrix) and a set of penalty matrices for the smooth terms.
Some linear identifiability constraints are also obtained at this point. The model is
fit using gam.fit
, gam.fit3
or varaints, which are modifications
of glm.fit
. The GAM
penalized likelihood maximization problem is solved by Penalized Iteratively
Reweighted Least Squares (PIRLS) (see e.g. Wood 2000).
Smoothing parameter selection is integrated in one of two ways. (i)
‘Performance iteration’ uses the fact that at each PIRLS iteration a penalized
weighted least squares problem is solved, and the smoothing parameters of that problem can
estimated by GCV or UBRE. Eventually, in most cases, both model parameter estimates and smoothing
parameter estimates converge. (ii) Alternatively the PIRLS scheme is iterated to
convergence for each trial set of smoothing parameters, and GCV, UBRE or REML scores
are only evaluated on convergence  optimization is then ‘outer’ to the PIRLS
loop: in this case the PIRLS iteration has to be differentiated, to
facilitate optimization, and gam.fit3
or one of its variants is used in place of
gam.fit
. The default is the second method, outer iteration.
Several alternative basispenalty types are built in for representing model
smooths, but alternatives can easily be added (see smooth.terms
for an overview and smooth.construct
for how to add smooth classes). The choice of the basis dimension
(k
in the s
, te
, ti
and t2
terms) is something that should be considered carefully
(the exact value is not critical, but it is important not to make it restrictively small, nor very large and
computationally costly). The basis should
be chosen to be larger than is believed to be necessary to approximate the smooth function concerned.
The effective degrees of freedom for the smooth will then be controlled by the smoothing penalty on
the term, and (usually) selected automatically (with an upper limit set by
k1
or occasionally k
). Of course
the k
should not be made too large, or computation will be slow (or in
extreme cases there will be more
coefficients to estimate than there are data).
Note that gam
assumes a very inclusive definition of what counts as a GAM:
basically any penalized GLM can be used: to this end gam
allows the non smooth model
components to be penalized via argument paraPen
and allows the linear predictor to depend on
general linear functionals of smooths, via the summation convention mechanism described in
linear.functional.terms
. link{family.mgcv}
details what is available beyond GLMs
and the exponential family.
Details of the default underlying fitting methods are given in Wood (2011 and 2004). Some alternative methods are discussed in Wood (2000 and 2006).
gam()
is not a clone of Trevor Hastie's original (as supplied in SPLUS or package gam). The major
differences are (i) that by default estimation of the
degree of smoothness of model terms is part of model fitting, (ii) a
Bayesian approach to variance estimation is employed that makes for easier
confidence interval calculation (with good coverage probabilities), (iii) that the model
can depend on any (bounded) linear functional of smooth terms, (iv) the parametric part of the model can be penalized,
(v) simple random effects can be incorporated, and
(vi) the facilities for incorporating smooths of more than one variable are
different: specifically there are no lo
smooths, but instead (a) s
terms can have more than one argument, implying an isotropic smooth and (b) te
,
ti
or t2
smooths are
provided as an effective means for modelling smooth interactions of any
number of variables via scale invariant tensor product smooths. Splines on the sphere, Duchon splines
and Gaussian Markov Random Fields are also available. (vii) Models beyond the exponential family are available.
See gam from package gam
, for GAMs via the original Hastie and Tibshirani approach.
Value
If fit=FALSE
the function returns a list G
of items needed to
fit a GAM, but doesn't actually fit it.
Otherwise the function returns an object of class "gam"
as described in gamObject
.
WARNINGS
The default basis dimensions used for smooth terms are essentially arbitrary, and
it should be checked that they are not too small. See choose.k
and
gam.check
.
You must have more unique combinations of covariates than the model has total parameters. (Total parameters is sum of basis dimensions plus sum of nonspline terms less the number of spline terms).
Automatic smoothing parameter selection is not likely to work well when fitting models to very few response data.
For data with many zeroes clustered together in the covariate space it is quite easy to set up GAMs which suffer from identifiability problems, particularly when using Poisson or binomial families. The problem is that with e.g. log or logit links, mean value zero corresponds to an infinite range on the linear predictor scale.
Author(s)
Simon N. Wood simon.wood@rproject.org
Front end design inspired by the S function of the same name based on the work of Hastie and Tibshirani (1990). Underlying methods owe much to the work of Wahba (e.g. 1990) and Gu (e.g. 2002).
References
Key References on this implementation:
Wood, S.N. (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):336
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass. 99:673686. [Default method for additive case by GCV (but no longer for generalized)]
Wood, S.N. (2003) Thin plate regression splines. J.R.Statist.Soc.B 65(1):95114
Wood, S.N. (2006a) Low rank scale invariant tensor product smooths for generalized additive mixed models. Biometrics 62(4):10251036
Wood S.N. (2006b) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC Press.
Wood S.N., F. Scheipl and J.J. Faraway (2012) Straightforward intermediate rank tensor product smoothing in mixed models. Statistical Computing.
Marra, G and S.N. Wood (2012) Coverage Properties of Confidence Intervals for Generalized Additive Model Components. Scandinavian Journal of Statistics, 39(1), 5374.
Key Reference on GAMs and related models:
Hastie (1993) in Chambers and Hastie (1993) Statistical Models in S. Chapman and Hall.
Hastie and Tibshirani (1990) Generalized Additive Models. Chapman and Hall.
Wahba (1990) Spline Models of Observational Data. SIAM
Wood, S.N. (2000) Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties. J.R.Statist.Soc.B 62(2):413428 [The original mgcv paper, but no longer the default methods.]
Background References:
Green and Silverman (1994) Nonparametric Regression and Generalized Linear Models. Chapman and Hall.
Gu and Wahba (1991) Minimizing GCV/GML scores with multiple smoothing parameters via the Newton method. SIAM J. Sci. Statist. Comput. 12:383398
Gu (2002) Smoothing Spline ANOVA Models, Springer.
McCullagh and Nelder (1989) Generalized Linear Models 2nd ed. Chapman & Hall.
O'Sullivan, Yandall and Raynor (1986) Automatic smoothing of regression functions in generalized linear models. J. Am. Statist.Ass. 81:96103
Wood (2001) mgcv:GAMs and Generalized Ridge Regression for R. R News 1(2):2025
Wood and Augustin (2002) GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157:157177
http://www.maths.bris.ac.uk/~sw15190/
See Also
mgcvpackage
, gamObject
, gam.models
, smooth.terms
,
linear.functional.terms
, s
,
te
predict.gam
,
plot.gam
, summary.gam
, gam.side
,
gam.selection
, gam.control
gam.check
, linear.functional.terms
negbin
, magic
,vis.gam
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195  ## see also examples in ?gam.models (e.g. 'by' variables,
## random effects and tricks for large binary datasets)
library(mgcv)
set.seed(2) ## simulate some data...
dat < gamSim(1,n=400,dist="normal",scale=2)
b < gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
summary(b)
plot(b,pages=1,residuals=TRUE) ## show partial residuals
plot(b,pages=1,seWithMean=TRUE) ## `with intercept' CIs
## run some basic model checks, including checking
## smoothing basis dimensions...
gam.check(b)
## same fit in two parts .....
G < gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat)
b < gam(G=G)
print(b)
## 2 part fit enabling manipulation of smoothing parameters...
G < gam(y~s(x0)+s(x1)+s(x2)+s(x3),fit=FALSE,data=dat,sp=b$sp)
G$lsp0 < log(b$sp*10) ## provide log of required sp vec
gam(G=G) ## it's smoother
## change the smoothness selection method to REML
b0 < gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,method="REML")
## use alternative plotting scheme, and way intervals include
## smoothing parameter uncertainty...
plot(b0,pages=1,scheme=1,unconditional=TRUE)
## Would a smooth interaction of x0 and x1 be better?
## Use tensor product smooth of x0 and x1, basis
## dimension 49 (see ?te for details, also ?t2).
bt < gam(y~te(x0,x1,k=7)+s(x2)+s(x3),data=dat,
method="REML")
plot(bt,pages=1)
plot(bt,pages=1,scheme=2) ## alternative visualization
AIC(b0,bt) ## interaction worse than additive
## Alternative: test for interaction with a smooth ANOVA
## decomposition (this time between x2 and x1)
bt < gam(y~s(x0)+s(x1)+s(x2)+s(x3)+ti(x1,x2,k=6),
data=dat,method="REML")
summary(bt)
## If it is believed that x0 and x1 are naturally on
## the same scale, and should be treated isotropically
## then could try...
bs < gam(y~s(x0,x1,k=40)+s(x2)+s(x3),data=dat,
method="REML")
plot(bs,pages=1)
AIC(b0,bt,bs) ## additive still better.
## Now do automatic terms selection as well
b1 < gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat,
method="REML",select=TRUE)
plot(b1,pages=1)
## set the smoothing parameter for the first term, estimate rest ...
bp < gam(y~s(x0)+s(x1)+s(x2)+s(x3),sp=c(0.01,1,1,1),data=dat)
plot(bp,pages=1,scheme=1)
## alternatively...
bp < gam(y~s(x0,sp=.01)+s(x1)+s(x2)+s(x3),data=dat)
# set lower bounds on smoothing parameters ....
bp<gam(y~s(x0)+s(x1)+s(x2)+s(x3),
min.sp=c(0.001,0.01,0,10),data=dat)
print(b);print(bp)
# same with REML
bp<gam(y~s(x0)+s(x1)+s(x2)+s(x3),
min.sp=c(0.1,0.1,0,10),data=dat,method="REML")
print(b0);print(bp)
## now a GAM with 3df regression spline term & 2 penalized terms
b0 < gam(y~s(x0,k=4,fx=TRUE,bs="tp")+s(x1,k=12)+s(x2,k=15),data=dat)
plot(b0,pages=1)
## now simulate poisson data...
dat < gamSim(1,n=2000,dist="poisson",scale=.1)
## use "cr" basis to save time, with 2000 data...
b2<gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
s(x3,bs="cr"),family=poisson,data=dat,method="REML")
plot(b2,pages=1)
## drop x3, but initialize sp's from previous fit, to
## save more time...
b2a<gam(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr"),
family=poisson,data=dat,method="REML",
in.out=list(sp=b2$sp[1:3],scale=1))
par(mfrow=c(2,2))
plot(b2a)
par(mfrow=c(1,1))
## similar example using performance iteration
dat < gamSim(1,n=400,dist="poisson",scale=.25)
b3<gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,optimizer="perf")
plot(b3,pages=1)
## repeat using GACV as in Wood 2008...
b4<gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,method="GACV.Cp",scale=1)
plot(b4,pages=1)
## repeat using REML as in Wood 2011...
b5<gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,
data=dat,method="REML")
plot(b5,pages=1)
## a binary example (see ?gam.models for large dataset version)...
dat < gamSim(1,n=400,dist="binary",scale=.33)
lr.fit < gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=binomial,
data=dat,method="REML")
## plot model components with truth overlaid in red
op < par(mfrow=c(2,2))
fn < c("f0","f1","f2","f3");xn < c("x0","x1","x2","x3")
for (k in 1:4) {
plot(lr.fit,residuals=TRUE,select=k)
ff < dat[[fn[k]]];xx < dat[[xn[k]]]
ind < sort.int(xx,index.return=TRUE)$ix
lines(xx[ind],(ffmean(ff))[ind]*.33,col=2)
}
par(op)
anova(lr.fit)
lr.fit1 < gam(y~s(x0)+s(x1)+s(x2),family=binomial,
data=dat,method="REML")
lr.fit2 < gam(y~s(x1)+s(x2),family=binomial,
data=dat,method="REML")
AIC(lr.fit,lr.fit1,lr.fit2)
## For a Gamma example, see ?summary.gam...
## For inverse Gaussian, see ?rig
## now 2D smoothing...
eg < gamSim(2,n=500,scale=.1)
attach(eg)
op < par(mfrow=c(2,2),mar=c(4,4,1,1))
contour(truth$x,truth$z,truth$f) ## contour truth
b4 < gam(y~s(x,z),data=data) ## fit model
fit1 < matrix(predict.gam(b4,pr,se=FALSE),40,40)
contour(truth$x,truth$z,fit1) ## contour fit
persp(truth$x,truth$z,truth$f) ## persp truth
vis.gam(b4) ## persp fit
detach(eg)
par(op)
## Not run:
##################################################
## largish dataset example with user defined knots
##################################################
par(mfrow=c(2,2))
n < 5000
eg < gamSim(2,n=n,scale=.5)
attach(eg)
ind<sample(1:n,200,replace=FALSE)
b5<gam(y~s(x,z,k=40),data=data,
knots=list(x=data$x[ind],z=data$z[ind]))
## various visualizations
vis.gam(b5,theta=30,phi=30)
plot(b5)
plot(b5,scheme=1,theta=50,phi=20)
plot(b5,scheme=2)
par(mfrow=c(1,1))
## and a pure "knot based" spline of the same data
b6<gam(y~s(x,z,k=64),data=data,knots=list(x= rep((1:80.5)/8,8),
z=rep((1:80.5)/8,rep(8,8))))
vis.gam(b6,color="heat",theta=30,phi=30)
## varying the default large dataset behaviour via `xt'
b7 < gam(y~s(x,z,k=40,xt=list(max.knots=500,seed=2)),data=data)
vis.gam(b7,theta=30,phi=30)
detach(eg)
## End(Not run)
