Multivariate Adaptive Regression Splines
Description
Build a regression model
using the techniques in Friedman's papers "Multivariate Adaptive Regression Splines"
and "Fast MARS".
See the package vignette
“Notes on the earth package”.
Usage
## S3 method for class 'formula'
earth(formula = stop("no 'formula' argument"), data = NULL,
weights = NULL, wp = NULL, subset = NULL,
na.action = na.fail,
pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"),
keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL,
nfold=0, ncross=1, stratify=TRUE,
varmod.method = "none", varmod.exponent = 1,
varmod.conv = 1, varmod.clamp = .1, varmod.minspan = 3,
Scale.y = NULL, ...)
## Default S3 method:
earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"),
weights = NULL, wp = NULL, subset = NULL,
na.action = na.fail,
pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"),
keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL,
nfold=0, ncross=1, stratify=TRUE,
varmod.method = "none", varmod.exponent = 1,
varmod.conv = 1, varmod.clamp = .1, varmod.minspan = 3,
Scale.y = NULL, ...)
## S3 method for class 'fit'
earth(x = stop("no 'x' argument"), y = stop("no 'y' argument"),
weights = NULL, wp = NULL, subset = NULL,
na.action = na.fail, offset = NULL,
pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"),
keepxy = FALSE, trace = 0, glm = NULL, degree = 1,
penalty = if(degree > 1) 3 else 2,
nk = min(200, max(20, 2 * ncol(x))) + 1,
thresh = 0.001, minspan = 0, endspan = 0,
newvar.penalty = 0, fast.k = 20, fast.beta = 1,
linpreds = FALSE, allowed = NULL,
nprune = NULL, Object = NULL,
Scale.y = NULL, Adjust.endspan = 2, Auto.linpreds = TRUE,
Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE,
Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e10, ...)
Arguments
To start off, look at the arguments
formula
,
data
,
x
,
y
,
nk
,
degree
, and
trace
.
If the response is binary or a factor, consider using the glm
argument.
For cross validation, use the nfold
argument.
For prediction intervals, use the varmod.method
argument.
Most users will find that the above arguments are all they need,
plus in some cases keepxy
and nprune
.
Unless you are a knowledgeable user, it's best not subvert the
standard algorithm by toying with tuning parameters such as thresh
,
penalty
, and endspan
.
formula 
Model formula.

data 
Data frame for formula .

x 
Matrix or dataframe containing the independent variables.

y 
Vector containing the response variable, or, in the case of multiple responses,
a matrix or dataframe whose columns are the values for each response.

subset 
Index vector specifying which cases to use, i.e., which rows in x to use.
Default is NULL, meaning all.

weights 
Case weights.
Default is NULL, meaning no case weights.
If specified, weights must have length equal to nrow(x)
before applying subset .
Zero weights are converted to a very small nonzero value.
In the current implementation, building models with weights can be slow.

wp 
Response weights.
Default is NULL, meaning no response weights.
If specified, wp must have an element for each column of
y (after factors in
y , if any, have been expanded).
Zero weights are converted to a very small nonzero value.

na.action 
NA action. Default is na.fail , and only na.fail is supported.

offset 
Offset term passed from the formula in earth.formula .

keepxy 
Default is FALSE .
Set to TRUE to retain the following in the returned value: x and y (or data ),
subset , and weights .
The function update.earth and friends will use these
if present instead of searching for them
in the environment at the time update.earth is invoked.
When the nfold argument is used with keepxy=TRUE ,
earth keeps more data and calls predict.earth multiple
times to generate cv.oof.rsq.tab and cv.infold.rsq.tab
(see the cv. arguments in earth.object ).
It therefore makes crossvalidation significantly slower.

trace 
Trace earth 's execution. Values:
0 (default) no tracing
.3 variance model (the varmod.method arg)
.5 cross validation (the nfold arg)
1 overview
2 forward pass
3 pruning
4 model mats summary, pruning details
5 full model mats, internal details of operation

glm 
NULL (default) or a list of arguments to pass on to glm .
See the documentation of glm for a description of these arguments
See “Generalized linear models” in the vignette.
Example:
earth(survived~., data=etitanic, degree=2, glm=list(family=binomial))
The following arguments are for the forward pass.

degree 
Maximum degree of interaction (Friedman's mi).
Default is 1 , meaning build an additive model (i.e., no interaction terms).

penalty 
Generalized Cross Validation (GCV) penalty per knot.
Default is if(degree>1) 3 else 2 .
Simulation studies suggest values in the range of about 2 to 4 .
The FAQ section in the vignette has some information on GCVs.
Special values (for use by knowledgeable users):
The value 0 penalizes only terms, not knots.
The value 1 means no penalty, so GCV = RSS/n.

nk 
Maximum number of model terms before pruning, i.e., the
maximum number of terms created by the forward pass.
Includes the intercept.
The actual number of terms created by the forward pass will often be
less than nk because of other stopping conditions.
See “Termination conditions for the forward pass”
in the vignette.
The default is semiautomatically calculated from the number of predictors
but may need adjusting.

thresh 
Forward stepping threshold.
Default is 0.001 .
This is one of the arguments used to decide when forward stepping
should terminate:
the forward pass terminates if adding a term changes RSq by less than thresh .
See “Termination conditions for the forward pass” in the vignette.

minspan 
Minimum number of observations between knots.
(This increases resistance to runs of correlated noise in the input data.)
The default minspan=0 is treated specially and
means calculate the minspan internally, as per
Friedman's MARS paper section 3.8 with alpha = 0.05.
Set trace>=2 to see the calculated value.
Use minspan=1 and endspan=1 to consider all x values.
Negative values of minspan specify the maximum number of knots
per predictor. These will be equally spaced.
For example, minspan=3 allows three evenly spaced knots for each predictor.
As always, knots that fall in the end zones specified by endspan will be ignored.

endspan 
Minimum number of observations before the first and after the final knot.
The default endspan=0 is treated specially and
means calculate the endspan internally, as per
the MARS paper equation 45 with alpha = 0.05.
Set trace>=2 to see the calculated value.
Be wary of reducing endspan , especially if you plan to make
predictions beyond or near the limits of the training data.
Overfitting near the edges of training data is much more
likely with a small endspan .
The model's RSq and GRSq won't indicate when this
overfitting is occurring.
(A plotmo plot can help: look for sharp hinges at the
edges of the data). See also the Adjust.endspan argument.

newvar.penalty 
Penalty for adding a new variable in the forward pass
(Friedman's gamma, equation 74 in the MARS paper).
Default is 0 , meaning no penalty for adding a new variable.
Useful nonzero values typically range from about 0.01 to 0.2
and sometimes higher —
you will need to experiment.
A word of explanation. With the default newvar.penalty=0 ,
if two variables have nearly the same effect (e.g. they are
collinear), at any step in the forward pass earth will
arbitrarily select one or the other (depending on noise in the sample).
Both variables can appear in the
final model, complicating model interpretation. On the other hand
with a nonzero newvar.penalty , the forward pass will be
reluctant to add a new variable — it will rather try to use a
variable already in the model, if that does not affect RSq too much.
The resulting final model may be easier to interpret, if you are lucky.
There will often be a small performance hit (a worse GCV).

fast.k 
Maximum number of parent terms considered at each step of the forward pass.
(This speeds up the forward pass. See the Fast MARS paper section 3.0.)
Default is 20 .
A value of 0 is treated specially
(as being equivalent to infinity), meaning no Fast MARS.
Typical values, apart from 0 , are 20 , 10 , or 5 .
In general, with a lower fast.k (say 5 ), earth is faster;
with a higher fast.k , or with fast.k disabled (set to 0 ),
earth builds a better model.
However, because of random variation this general rule often doesn't apply.

fast.beta 
Fast MARS ageing coefficient, as described in the
Fast MARS paper section 3.1.
Default is 1 .
A value of 0 sometimes gives better results.

linpreds 
Index vector specifying which predictors should enter linearly, as in lm .
The default is FALSE , meaning predictors enter
in the standard MARS fashion, i.e., in hinge functions.
The linpreds argument does not specify that a predictor
must enter the model; only that if it enters, it enters
linearly. See “The linpreds argument” in the
vignette.
See also the Auto.linpreds argument below (which describes how
earth will automatically treat a predictor as linear
under certain conditions).
Details:
A predictor's index in linpreds is the column number in the input matrix x
(after factors have been expanded).
linpreds=TRUE makes all predictors enter linearly (the TRUE gets recycled).
linpreds may be a character vector e.g.
linpreds=c("wind", "vis") . Note: grep is used
for matching. Thus "wind" will match all variables that have
"wind" in their names. Use "^wind$" to match only the
variable named "wind" .

allowed 
Function specifying which predictors can interact and how.
Default is NULL, meaning all standard MARS terms are allowed.
During the forward pass, earth calls the allowed function
before considering a term for inclusion; the term can go into the
model only if the allowed function returns TRUE .
See “The allowed argument” in the vignette.
The following arguments are for the pruning pass.

pmethod 
Pruning method.
One of: backward none exhaustive forward seqrep cv .
Default is "backward" .
Specify pmethod="cv" to use crossvalidation to select the number of terms.
This selects the number of terms that gives the maximum
mean outoffold RSq on the fold models.
Requires the nfold argument.
Use "none" to retain all the terms created by the forward pass.
If y has multiple columns, then only "backward" or "none"
is allowed.
Pruning can take a while if "exhaustive" is chosen and
the model is big (more than about 30 terms).
The current version of the leaps package
used during pruning does not allow user interrupts
(i.e., you have to kill your R session to interrupt;
in Windows use the Task Manager or from the command line use taskkill ).

nprune 
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than nk ),
or to reduce exhaustive search time with pmethod="exhaustive" .
The following arguments are for cross validation.

nfold 
Number of crossvalidation folds.
Default is 0 , no cross validation.
If greater than 1 , earth first builds a standard model as usual with all the data.
It then builds nfold crossvalidated models,
measuring RSquared on the outoffold (left out) data each time.
The final cross validation RSquared (CVRSq ) is the mean of these
outoffold RSquareds.
The above process of building nfold models is repeated
ncross times (by default, once).
Use trace=.5 to trace crossvalidation.
Further statistics are calculated if keepxy=TRUE or
if a binomial or poisson model (specified with the glm argument).
See “Cross validation” in the vignette.

ncross 
Only applies if nfold>1 .
Number of crossvalidations. Each crossvalidation has nfold folds.
Default 1 .

stratify 
Only applies if nfold>1 .
Default is TRUE .
Stratify the crossvalidation samples so that
an approximately equal number of cases with a nonzero response
occur in each cross validation subset.
So if the response y is logical, the TRUE s will be spread
evenly across folds.
And if the response is a multilevel factor, there will be an
approximately equal number of each factor level in each fold
(because a multilevel factor response gets expanded to columns of zeros and ones,
see “Factors” in the vignette).
We say “approximately equal” because the number of occurrences of a factor
level may not be exactly divisible by the number of folds.
The following arguments are for variance models.

varmod.method 
Construct a variance model.
For details, see varmod and the vignette
“Variance models in earth”.
Use trace=.3 to trace construction of the variance model.
This argument requires nfold and ncross . (We suggest at least ncross=30
here to properly calculate the variance of the errors — although
you can use a smaller value, say 3 , for debugging.)
The varmod.method argument should be one of
"none" Default. Don't build a variance model.
"const" Assume homoscedastic errors.
"lm" Use lm to estimate standard deviation as a
function of the predicted response.
"rlm" Use rlm .
"earth" Use earth .
"gam" Use gam .
This will use either gam
or the mgcv package, whichever is loaded.
"power" Estimate standard deviation as
intercept + coef * predicted.response^exponent ,
where
intercept , coef , and exponent will be estimated by nls .
This is equivalent to varmod.method="lm" except that exponent is
automatically estimated instead of being held at the value
set by the varmod.exponent argument.
"power0" Same as "power" but no intercept (offset) term.
"x.lm" ,
"x.rlm" ,
"x.earth" ,
"x.gam"
Like the similarly named options above,
but estimate standard deviation by regressing on the predictors x
(instead of the predicted response).
A current implementation restriction is that "x.gam"
allows only models with one predictor (x must have only one column).

varmod.exponent 
Power transform applied to the rhs before regressing the
absolute residuals with the specified varmod.method .
Default is 1 .
For example, with varmod.method="lm" , if you expect the
standard deviance to increase linearly with the mean response, use
varmod.exponent=1 .
If you expect the standard deviance to increase with the square root
of the mean response, use
varmod.exponent=.5
(where negative response values will be treated as 0 ,
and you will get an error message if more than 20% of them are negative).

varmod.conv 
Convergence criterion for the Iteratively Reweighted Least Squares used
when creating the variance model.
Iterations stop when the mean value of the coefficients of the
residual model change by less than varmod.conv
percent.
Default is 1 percent.
Negative values force the specified number of iterations,
e.g. varmod.conv=2 means iterate twice.
Positive values are ignored for varmod="const"
and also currently ignored for varmod="earth"
(these are iterated just once, the same as using varmod.conv=1 ).

varmod.clamp 
The estimated standard deviation of the main model errors
is forced to be at least a small positive value,
which we call min.sd .
This prevents negative or absurdly small estimated standard deviations.
Clamping takes place in predict.varmod , which is called
by predict.earth when estimating prediction intervals.
The value of min.sd is determined when building the variance
model as min.sd = varmod.clamp * mean(sd(training.residuals)) .
The default varmod.clamp is 0.1 .

varmod.minspan 
Only applies when varmod.method="earth" or "x.earth" .
This is the minspan used in the internal call to earth
when creating the variance model (not the main earth model).
Default is 3 , i.e., three evenly spaced knots per predictor.
Residuals tend to be very noisy, and allowing only this small
number of knots helps prevent overfitting.
The following arguments are for internal or advanced use.

Object 
Earth object to be updated, for use by update.earth .

Scale.y 
Scale the response internally in the forward pass.
Scaling here means subtract the mean and divide by the standard
deviation.
For singleresponse models, the default is Scale.y = TRUE .
Scaling is invisible to the user, up to numerical differences,
but does provide better numeric stability.
For multipleresponse models, the default is FALSE .
If Scale.y is set TRUE , each column of the response is
independently scaled.
This can prevent one response from “overwhelming” the others,
and earth typically generates a different set of hinge functions.

Adjust.endspan 
In interaction terms, endspan gets multiplied by this value.
This reduces the possibility of an overfitted interaction term
supported by just a few cases on the boundary of the predictor space
(as sometimes seen in our simulation studies).
The default is 2 .
Use Adjust.endspan=1 for compatibility with old
versions of earth .

Auto.linpreds 
Default is TRUE , which works as follows
(see example):
At any step in the forward pass, if earth discovers that the best knot
for the best predictor is at the predictor minimum (in the
training data),
then earth adds the predictor to the model as a linear “basis
function” (with no hinge).
Compare the following basis functions (printed in bold)
for an example such predictor x :
Auto.linpreds=TRUE (default): x
Auto.linpreds=FALSE : max(x99, 0) where
99 is the minimum x in the training data.
Using Auto.linpreds=FALSE always forces a knot, even when the
knot is at the minimum value of the variable.
This ensures that the basis functions are always expressed as hinge functions
(and will always be nonnegative).
Note that Auto.linpreds affects only how the model behaves outside
the training data.
Thus predict.earth will
make the same predictions from the training data, regardless
of whether the earth model was built with Auto.linpreds set
TRUE or FALSE
(up to possible differences in the size of the model caused by
different GCVs because of the different forms of the terms).

Force.weights 
Default is FALSE .
For testing the weights argument.
Force use of the code for handling weights in the earth code,
even if weights=NULL or all the weights are the same.
This will not necessarily generate an identical model,
primarily because the nonweighted code requires some tests for
numerical stability that can sometimes affect knot selection.

Use.beta.cache 
Default is TRUE .
Using the “beta cache” takes a little more memory but is faster
(by 20% and often much more for large models).
The beta cache uses nk * nk * ncol(x) * sizeof(double) bytes.
(The beta cache is an innovation in this implementation of MARS
and does not appear in Friedman's papers. It is not related to
the fast.beta argument. Certain regression coefficients
in the forward pass can be saved and reused, thus
saving recalculation time.)

Force.xtx.prune 
Default is FALSE .
This argument pertains to subset evaluation in the pruning pass.
By default,
if y has a single column then earth calls the leaps routines;
if y has multiple columns then earth calls EvalSubsetsUsingXtx .
The leaps routines are numerically more stable
but do not support multiple responses
(leaps is based on the QR decomposition and
EvalSubsetsUsingXtx is based on the inverse of X'X).
Setting Force.xtx.prune=TRUE forces use of EvalSubsetsUsingXtx , even
if y has a single column.

Get.leverages 
Default is TRUE unless the model has more than 100 thousand cases.
The leverages are the diagonal hat values for the linear regression of
y on bx .
(The leverages are needed only for certain model checks, for example
when plotres is called with versus=4 ).
Details:
This argument was introduced to reduce peak memory usage.
When n >> p , memory use peaks when earth is
calculating the leverages.

Exhaustive.tol 
Default 1e10 .
Applies only when pmethod="exhaustive" .
If the reciprocal of the condition number of bx
is less than Exhaustive.tol , earth forces pmethod="backward" .
See “XHAUST returned error code 999” in the vignette.

... 
Dots are passed on to earth.fit .

Value
An S3 model of class "earth"
.
See earth.object
for a complete description.
Author(s)
Stephen Milborrow, derived from mda::mars
by Trevor Hastie and Robert Tibshirani.
The approach used for GLMs was motivated by work done by
Jane Elith and John Leathwick
(a representative paper is given below).
The evimp
function uses ideas from Max Kuhn's caret
package
https://CRAN.Rproject.org/package=caret.
Parts of Thomas Lumley's leaps
package have been
incorporated into earth
, so earth
can directly access
Alan Miller's Fortran functions without going through hidden functions
in the leaps
package.
References
The Wikipedia article is recommended for an elementary introduction.
The primary references are the Friedman papers, but
readers may find the MARS section in Hastie, Tibshirani,
and Friedman a more accessible introduction.
Faraway takes a handson approach,
using the ozone
data to compare mda::mars
with other techniques.
(If you use Faraway's examples with earth
instead of mars
, use $bx
instead of $x
, and check out the book's errata.)
Friedman and Silverman is recommended background reading for the MARS paper.
Earth's pruning pass uses code from the leaps
package
which is based on techniques in Miller.
Faraway (2005) Extending the Linear Model with R
https://www.maths.bath.ac.uk/~jjf23
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion)
Annals of Statistics 19/1, 1–141
http://projecteuclid.org/euclid.aos/1176347963
doi: 10.1214/aos/1176347963
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
https://statistics.stanford.edu/research/fastmars
Friedman and Silverman (1989)
Flexible Parsimonious Smoothing and Additive Modeling
Technometrics, Vol. 31, No. 1.
https://www.tandfonline.com/doi/abs/10.1080/00401706.1989.10488470
Hastie, Tibshirani, and Friedman (2009) The Elements of Statistical Learning (2nd ed.)
https://hastie.su.domains/pub.htm
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005)
Using multivariate adaptive regression splines to predict the distributions
of New Zealand's freshwater diadromous fish Freshwater Biology, 50, 20342052
https://hastie.su.domains/pub.htm,
http://www.botany.unimelb.edu.au/envisci/about/staff/elith.html
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression
https://wp.csiro.au/alanmiller/index.html
Wikipedia article on MARS
https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
See Also
Start with summary.earth
, plot.earth
,
evimp
, and plotmo
.
Please see the main package vignette
“Notes on the earth package”.
The vignette can also be downloaded from
http://www.milbo.org/doc/earthnotes.pdf.
The vignette
“Variance models in earth”
is also included with the package.
It describes how to generate prediction intervals for earth
models.
Examples
earth.mod < earth(Volume ~ ., data = trees)
plotmo(earth.mod)
summary(earth.mod, digits = 2, style = "pmax")