dredge: Automated model selection
In MuMIn: Multi-Model Inference

dredge

R Documentation

Automated model selection

Description

Generate a model selection table of models with combinations (subsets) of fixed effect terms in the global model, with optional model inclusion rules.

Usage

dredge(global.model, beta = c("none", "sd", "partial.sd"), evaluate = TRUE,
  rank = "AICc", fixed = NULL, m.lim = NULL, m.min, m.max, subset,
  trace = FALSE, varying, extra, ct.args = NULL, deps = attr(allTerms0, "deps"),
  cluster = NULL,
  ...)

## S3 method for class 'model.selection'
print(x, abbrev.names = TRUE, warnings = getOption("warn") != -1L, ...)

Arguments

`global.model`	a fitted ‘global’ model object. See ‘Details’ for a list of supported types.
`beta`	indicates whether and how the coefficients are standardized, and must be one of `"none"`, `"sd"` or `"partial.sd"`. You can specify just the initial letter. `"none"` corresponds to unstandardized coefficients, `"sd"` and `"partial.sd"` to coefficients standardized by SD and Partial SD, respectively. For backwards compatibility, logical value is also accepted, `TRUE` is equivalent to `"sd"` and `FALSE` to `"none"`. See \lcodestd.coef.
`evaluate`	whether to evaluate and rank the models. If `FALSE`, a list of unevaluated `call`s is returned.
`rank`	optionally, the rank function returning a sort of an information criterion, to be used instead `AICc`, e.g. `AIC`, `QAIC` or `BIC`. See ‘Details’.
`fixed`	optional, either a single-sided formula or a character vector giving names of terms to be included in all models. Not to be confused with fixed effects. See ‘Subsetting’.
`m.lim`, `m.max`, `m.min`	optionally, the limits `c(lower, upper)` for the number of terms in a single model (excluding the intercept). An `NA` means no limit. See ‘Subsetting’. Specifying limits as `m.min` and `m.max` is allowed for backward compatibility.
`subset`	logical expression or a `matrix` describing models to be kept in the resulting set. `NULL` or `TRUE` disables subsetting. For details, see ‘Subsetting’.
`trace`	if `TRUE` or `1`, all calls to the fitting function are printed before actual fitting takes place. If `trace > 1`, a progress bar is displayed.
`varying`	optionally, a named list describing the additional arguments to vary between the generated models. Item names correspond to the arguments, and each item provides a list of choices (i.e. `list(arg1 = list(choice1, choice2, ...), ...)`). Complex elements in the choice list (such as `family` objects) should be either named (uniquely) or quoted (unevaluated, e.g. using \lcodealist, see \lcodequote), otherwise the result may be visually unpleasant. See example in \lcodeBeetle.
`extra`	optional additional statistics to be included in the result, provided as functions, function names or a list of such (preferably named or quoted). As with the `rank` argument, each function must accept as an argument a fitted model object and return (a value coercible to) a numeric vector. This could be, for instance, additional information criteria or goodness-of-fit statistics. The character strings `"R^2"` and `"adjR^2"` are treated in a special way and add a likelihood-ratio based \Rsq and modified-\Rsq to the result, respectively (this is more efficient than using \lcoder.squaredLR directly).
`x`	a `model.selection` object, returned by `dredge`.
`abbrev.names`	Should term names in the table header be abbreviated when printed? This is the default. If full names are required, use `print()` explicitly with this argument set to `FALSE`.
`warnings`	if `TRUE`, errors and warnings issued during the model fitting are printed below the table (only with `pdredge`). To permanently remove the warnings, set the object's attribute `"warnings"` to `NULL`.
`ct.args`	optional list of arguments to be passed to \lcodecoefTable (e.g. `dispersion` parameter for `glm` affecting standard errors used in subsequent \lxcodemodel averaging=model.avg).
`deps`	a “dependency matrix” as returned by `getAllTerms`, attribute `"deps"`. Can be used to fine-tune marginality exceptions.
`cluster`	if a valid `"cluster"` object is given, it is used for parallel execution. If `NULL` or omitted, execution is single-threaded. With parallel calculation, an extra argument `check` is accepted. See \lcodepdredge for details and examples.
`...`	optional arguments for the `rank` function. Any can be an unevaluated expression, in which case any `x` within it will be substituted with the current model.

Details

Models are fitted through repeated evaluation of the modified call extracted from the global.model (in a similar fashion to update). This approach, while having the advantage that it can be applied to most model types through the usual formula interface, can have a considerable computational overhead.

Note that the number of combinations grows exponentially with the number of predictors (2^{N}, less when interactions are present, see below).

The fitted model objects are not stored in the result. To get (a subset of) the models, use \lcodeget.models on the object returned by dredge. Another way to get all the models is to run lapply(dredge(..., evaluate = FALSE), eval), which avoids fitting models twice.

For a list of model types that can be used as a global.model see the list of supported models. Modelling functions that do not store a call in their result should be run via a wrapper function created by \lcodeupdateable.

Information criterion

rank is found by a call to match.fun and may be specified as a function, a symbol, or as a character string specifying a function to be searched for from the environment of the call to dredge. It can be also a one-element named list, where the first element is taken as the rank function. The function rank must accept a model object as its first argument and always return a scalar.

Interactions

By default, marginality constraints are respected, so that “all possible combinations” include only those that contain interactions with their respective main effects and all lower order terms, unless the global.model makes an exception to this principle (e.g. due to a nested design such as a / b).

Subsetting

The resulting set of models can be constrained with three methods: (1) set limits on the number of terms in a model with m.lim, (2) bind term(s) to all models with fixed, and (3) use subset for more complex rules. To be included in the selection table, the formulation of a model must satisfy all these conditions.

subset can be an expression or a matrix. If a matrix, it should be a logical, lower triangular matrix, with rows and columns corresponding to global.model terms. If this matrix has dimnames, they must match the term names (as returned by getAllTerms). Unmatched names are silently ignored. Otherwise, if rows or columns are unnamed, they are matched positionally to the model terms, and dim(subset) must be equal to the number of terms. For example, subset["a", "b"] == FALSE excludes models with both a and b terms; and if unnamed, subset, subset[2, 3] == FALSE will prevent the second and third terms of the global model from being both in the same model.

demo(dredge.subset) has examples of using the subset matrix in conjunction with correlation matrices to exclude models containing collinear predictors.

In the form of an expression, the argument subset acts similarly to that of subset() for data.frames. Model terms can be referred to by name as variables in the expression, except that they are interpreted as logical values indicating the presence of a term in the model.

The expression can contain any of the global.model term names, as well as names of the varying list items. global.model term names take precedence when identical to names of varying, so to avoid ambiguity varying variables in subset expression should be enclosed in V() (e.g. V(family) == "Gamma") assuming that varying is something like list(family = c("Gamma", ...))).

If elements of varying are unnamed, they are coerced into names. Calls and symbols are represented as character values (via "deparse"), and everything except numeric, logical, character and NULL values is represented by element numbers (e.g. subset = V(family) == 2 points to Gamma family in varying =list(family =list(gaussian, Gamma)). This can easily become obscure, so using named lists in varying is recommended. Examples can be found in demo(dredge.varying).

Term names appearing in fixed and subset must be given exactly as they are returned by getAllTerms(global.model), which may differ from the original term names (e.g. the interaction term components are ordered alphabetically).

The with(x) and with(+x) notation indicates, respectively, any and all interactions including the main effect term x. This is only effective with marginality exceptions. The extended form with(x, order) allows to specify the order of interaction of terms of which x is a part. For instance, with(b, 2:3) selects models with at least one second- or third-order interaction of variable b. The second (positional) argument is coerced to an integer vector. The “dot” notation .(x) is an alias for with.

The special variable `*nvar*` (backtick-quoted), in the subset expression is equal to the number of terms in the model (not the number of parameters).

To include a model term conditionally on the presence of another term, use dc (“dependency chain”) in the subset expression. dc takes any number of term names as arguments, and allows a term to be included only if all preceding ones are also present (e.g. subset = dc(a, b, c) allows for models a, a+b and a+b+c but not b, c, b+c or a+c).

subset expression can have a form of an unevaluated call, expression object, or a one-sided formula. See ‘Examples’.

Compound model terms (such as interactions, ‘as-is’ expressions within I() or smooths in gam) should be enclosed within curly brackets (e.g. {s(x,k=2)}), or backticks (like non-syntactic names, e.g. `s(x, k = 2)` ), except when they are arguments to with or dc. Backtick-quoted names must match exactly (including whitespace) the term names as returned by getAllTerms.

`subset` expression syntax summary

a & b: indicates that model terms a and b must be present (see Logical Operators)
{log(x,2)} or \bqlog(x, 2): represent a complex model term log(x, 2)
V(x): represents a varying item x
with(x): indicates that at least one term containing the main effect term x must be present
with(+x): indicates that all the terms containing the main effect term x must be present
with(x, n:m): indicates that at least one term containing an n-th to m-th order interaction term of x must be present
dc(a, b, c,...): ‘dependency chain’: b is allowed only if a is present, and c only if both a and b are present, etc.
`*nvar*`: the number of terms in the model.

To simply keep certain terms in all models, it is much more efficient to use the fixed argument. The fixed formula is interpreted in the same manner as model formula, so the terms must not be quoted.

Missing values

Use of na.action = "na.omit" (R's default) or "na.exclude" in global.model must be avoided, as it results with sub-models fitted to different data sets if there are missing values. An error is thrown if it is detected.

It is a common mistake to give na.action as an argument in the call to dredge (typically resulting in an error from the rank function to which the argument is passed through ‘...’), while the correct way is either to pass na.action in the call to the global model or to set it as a global option.

Intercept

If present in the global.model, the intercept will be included in all sub-models.

Methods

There are \lxcodesubset=subset.model.selection and \lxcodeplot=plot.model.selection methods, the latter creates a graphical representation of model weights and per-model term sum of weights. Coefficients can be extracted with coef or \lcodecoefTable.

Value

An object of class c("model.selection", "data.frame"), being a data.frame, where each row represents one model. See \lcodemodel.selection.object for its structure.

Note

Users should keep in mind the hazards that a “thoughtless approach” of evaluating all possible models poses. Although this procedure is in certain cases useful and justified, it may result in selecting a spurious “best” model, due to the model selection bias.

“Let the computer find out” is a poor strategy and usually reflects the fact that the researcher did not bother to think clearly about the problem of interest and its scientific setting (Burnham and Anderson, 2002).

Author(s)

Kamil Bartoń

Examples

# Example from Burnham and Anderson (2002), page 100:

#  prevent fitting sub-models to different datasets

options(na.action = "na.fail")

fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
subset(dd, delta < 4)

# Visualize the model selection table:

par(mar = c(3,5,6,4))
plot(dd, labAsExpr = TRUE)


# Model average models with delta AICc < 4
model.avg(dd, subset = delta < 4)

#or as a 95% confidence set:
model.avg(dd, subset = cumsum(weight) <= .95) # get averaged coefficients

#'Best' model
summary(get.models(dd, 1)[[1]])

## Not run: 
# Examples of using 'subset':
# keep only models containing X3
dredge(fm1, subset = ~ X3) # subset as a formula
dredge(fm1, subset = expression(X3)) # subset as expression object
# the same, but more effective:
dredge(fm1, fixed = "X3")
# exclude models containing both X1 and X2 at the same time
dredge(fm1, subset = !(X1 && X2))
# Fit only models containing either X3 or X4 (but not both);
# include X3 only if X2 is present, and X2 only if X1 is present.
dredge(fm1, subset = dc(X1, X2, X3) && xor(X3, X4))
# the same as above, without "dc"
dredge(fm1, subset = (X1 | !X2) && (X2 | !X3) && xor(X3, X4))

# Include only models with up to 2 terms (and intercept)
dredge(fm1, m.lim = c(0, 2))

## End(Not run)

# Add R^2 and F-statistics, use the 'extra' argument
dredge(fm1, m.lim = c(NA, 1), extra = c("R^2", F = function(x)
    summary(x)$fstatistic[[1]]))

# with summary statistics:
dredge(fm1, m.lim = c(NA, 1), extra = list(
    "R^2", "*" = function(x) {
        s <- summary(x)
        c(Rsq = s$r.squared, adjRsq = s$adj.r.squared,
            F = s$fstatistic[[1]])
    })
)

# Add other information criteria (but rank with AICc):
dredge(fm1, m.lim = c(NA, 1), extra = alist(AIC, BIC, ICOMP, Cp))

MuMIn documentation built on April 1, 2025, 3 a.m.

MuMIn index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MuMIn
Multi-Model Inference

dredge: Automated model selection
In MuMIn: Multi-Model Inference

Automated model selection

Description

Usage

Arguments

Details

Information criterion

Interactions

Subsetting

`subset` expression syntax summary

Missing values

Intercept

Methods

Value

Note

Author(s)

See Also

Examples

Related to dredge in MuMIn...

R Package Documentation

Browse R Packages

We want your feedback!

MuMIn Multi-Model Inference

dredge: Automated model selection In MuMIn: Multi-Model Inference

Automated model selection

Description

Usage

Arguments

Details

Information criterion

Interactions

Subsetting

subset expression syntax summary

Missing values

Intercept

Methods

Value

Note

Author(s)

See Also

Examples

Related to dredge in MuMIn...

R Package Documentation

Browse R Packages

We want your feedback!

MuMIn
Multi-Model Inference

dredge: Automated model selection
In MuMIn: Multi-Model Inference

`subset` expression syntax summary