ref_grid: Create a reference grid from a fitted model
In emmeans: Estimated Marginal Means, aka Least-Squares Means

ref_grid

R Documentation

Create a reference grid from a fitted model

Description

Using a fitted model object, determine a reference grid for which estimated marginal means are defined. The resulting ref_grid object encapsulates all the information needed to calculate EMMs and make inferences on them.

Usage

ref_grid(object, at, cov.reduce = mean,
  cov.keep = get_emm_option("cov.keep"), mult.names, mult.levs,
  options = get_emm_option("ref_grid"), data, df, type, regrid, nesting,
  offset, sigma, counterfactuals, nuisance = character(0), non.nuisance,
  wt.nuis = "equal", rg.limit = get_emm_option("rg.limit"), ...)

Arguments

`object`	An object produced by a supported model-fitting function, such as `lm`. Many models are supported. See `vignette("models", "emmeans")`.
`at`	Optional named list of levels for the corresponding variables
`cov.reduce`	A function, logical value, or formula; or a named list of these. Each covariate not specified in `cov.keep` or `at` is reduced according to these specifications. See the section below on “Using `cov.reduce` and `cov.keep`”.
`cov.keep`	Character vector: names of covariates that are not to be reduced; these are treated as factors and used in weighting calculations. `cov.keep` may also include integer value(s), and if so, the maximum of these is used to set a threshold such that any covariate having no more than that many unique values is automatically included in `cov.keep`.
`mult.names`	Character value: the name(s) to give to the pseudo-factor(s) whose levels delineate the elements of a multivariate response. If this is provided, it overrides the default name(s) used for `class(object)` when it has a multivariate response (e.g., the default is `"rep.meas"` for `"mlm"` objects).
`mult.levs`	A named list of levels for the dimensions of a multivariate response. If there is more than one element, the combinations of levels are used, in `expand.grid` order. The (total) number of levels must match the number of dimensions. If `mult.name` is specified, this argument is ignored.
`options`	If non-`NULL`, a named `list` of arguments to pass to `update.emmGrid`, just after the object is constructed.
`data`	A `data.frame` to use to obtain information about the predictors (e.g. factor levels). If missing, then `recover_data` is used to attempt to reconstruct the data. See the note with `recover_data` for an important precaution.
`df`	Numeric value. This is equivalent to specifying `options(df = df)`. See `update.emmGrid`.
`type`	Character value. If provided, this is saved as the `"predict.type"` setting. See `update.emmGrid` and the section below on prediction types and transformations.
`regrid`	Character, logical, or list. If non-missing, the reference grid is reconstructed via `regrid` with the argument `transform = regrid`. See the section below on prediction types and transformations. Note: This argument was named `transform` in version 1.7.2 and earlier. For compatibility with old code, `transform` is still accepted if found among `...`, as long as it doesn't match `tran`.
`nesting`	If the model has nested fixed effects, this may be specified here via a character vector or named `list` specifying the nesting structure. Specifying `nesting` overrides any nesting structure that is automatically detected. See the section below on Recovering or Overriding Model Information.
`offset`	Numeric scalar value (if a vector, only the first element is used). This may be used to add an offset, or override offsets based on the model. A common usage would be to specify `offset = 0` for a Poisson regression model, so that predictions from the reference grid become rates relative to the offset that had been specified in the model.
`sigma`	Numeric value to use for subsequent predictions or back-transformation bias adjustments. If not specified, we use `sigma(object)`, if available, and `NULL` otherwise. Note: This applies only when the family is `"gaussian"`; for other families, `sigma` is set to `NA` and cannot be overridden.
`counterfactuals`	`counterfactuals` specifies character names of counterfactual factors. If this is non-missing, a reference grid is created consisting of combinations of counterfactual levels and the actual levels of those same factors. This grid is always converted to the response transformation scale and averaged over the actual factor levels. See the section below on counterfactuals.
`nuisance`, `non.nuisance`, `wt.nuis`	If `nuisance` is a vector of predictor names, those predictors are omitted from the reference grid. Instead, the result will be as if we had averaged over the levels of those factors, with either equal or proportional weights as specified in `wt.nuis` (see the `weights` argument in `emmeans`). The factors in `nuisance` must not interact with other factors, not even other nuisance factors. Specifying nuisance factors can save considerable storage and computation time, and help avoid exceeding the maximum reference-grid size (`get_emm_option("rg.limit")`). (Note: For certain models where the `emm_basis` method returns a re-gridded parameterization, nuisance factors cannot be used, and an error is thrown.)
`rg.limit`	Integer limit on the number of reference-grid rows to allow (checked before any multivariate responses are included).
`...`	Optional arguments passed to `summary.emmGrid`, `emm_basis`, and `recover_data`, such as `params`, `vcov.` (see Covariance matrix below), or options such as `mode` for specific model types (see vignette("models", "emmeans")).

Details

To users, the ref_grid function itself is important because most of its arguments are in effect arguments of emmeans and related functions, in that those functions pass their ... arguments to ref_grid.

The reference grid consists of combinations of independent variables over which predictions are made. Estimated marginal means are defined as these predictions, or marginal averages thereof. The grid is determined by first reconstructing the data used in fitting the model (see recover_data), or by using the data.frame provided in data.

By “independent variables,” we mean (in most cases) the results of all.vars() applied to the fixed-effects part of the right-hand side of the model formula. Any random effects are excluded. Thus, if the model formula in an lme4::lmer call is yield ~ fert + seed*density + log(rain) + (1|block/plot), the independent variables are fert, seed, density, and rain (not log(rain)). In multivariate models, the dimension of the multivariate response is also considered an independent variable.

The default reference grid is determined by the observed levels of any factors, the ordered unique values of character-valued predictors, and the results of cov.reduce for numeric predictors. These may be overridden using at. See also the section below on recovering/overriding model information.

Value

An object of the S4 class "emmGrid" (see emmGrid-class). These objects encapsulate everything needed to do calculations and inferences for estimated marginal means, and contain nothing that depends on the model-fitting procedure.

Using `cov.reduce` and `cov.keep`

The cov.keep argument was not available in emmeans versions 1.4.1 and earlier. Any covariates named in this list are treated as if they are factors: all the unique levels are kept in the reference grid. The user may also specify an integer value, in which case any covariate having no more than that number of unique values is implicitly included in cov.keep. The default for cov.keep is set and retrieved via the emm_options framework, and the system default is "2", meaning that covariates having only two unique values are automatically treated as two-level factors. See also the Note below on backward compatibility.

There is a subtle distinction between including a covariate in cov.keep and specifying its values manually in at: Covariates included in cov.keep are treated as factors for purposes of weighting, while specifying levels in at will not include the covariate in weighting. See the mtcars.lm example below for an illustration.

cov.reduce may be a function, logical value, formula, or a named list of these. If a single function, it is applied to each covariate. If logical and TRUE, mean is used. If logical and FALSE, it is equivalent to including all covariates in cov.keep. Use of ‘⁠cov.reduce = FALSE⁠’ is inadvisable because it can result in a huge reference grid; it is far better to use cov.keep.

If a formula (which must be two-sided), then a model is fitted to that formula using lm; then in the reference grid, its response variable is set to the results of predict for that model, with the reference grid as newdata. (This is done after the reference grid is determined.) A formula is appropriate here when you think experimental conditions affect the covariate as well as the response.

To allow for situations where a simple lm() call as described above won't be adequate, a formula of the form ext ~ fcnname is also supported, where the left-hand side may be ext, extern, or external (and must not be a predictor name) and the right-hand side is the name of an existing function. The function is called with one argument, a data frame with columns for each variable in the reference grid. The function is expected to use that frame as new data to be used to obtain predictions for one or more models; and it should return a named list or data frame with replacement values for one or more of the covariates.

If cov.reduce is a named list, then the above criteria are used to determine what to do with covariates named in the list. (However, formula elements do not need to be named, as those names are determined from the formulas' left-hand sides.) Any unresolved covariates are reduced using "mean".

Any cov.reduce of cov.keep specification for a covariate also named in at is ignored.

Interdependent covariates

Care must be taken when covariate values depend on one another. For example, when a polynomial model was fitted using predictors x, x2 (equal to x^2), and x3 (equal to x^3), the reference grid will by default set x2 and x3 to their means, which is inconsistent. The user should instead use the at argument to set these to the square and cube of mean(x). Better yet, fit the model using a formula involving poly(x, 3) or I(x^2) and I(x^3); then there is only x appearing as a covariate; it will be set to its mean, and the model matrix will have the correct corresponding quadratic and cubic terms.

Matrix covariates

Support for covariates that appear in the dataset as matrices is very limited. If the matrix has but one column, it is treated like an ordinary covariate. Otherwise, with more than one column, each column is reduced to a single reference value – the result of applying cov.reduce to each column (averaged together if that produces more than one value); you may not specify values in at; and they are not treated as variables in the reference grid, except for purposes of obtaining predictions.

Recovering or overriding model information

Ability to support a particular class of object depends on the existence of recover_data and emm_basis methods – see extending-emmeans for details. The call methods("recover_data") will help identify these.

Data. In certain models, (e.g., results of glmer.nb), it is not possible to identify the original dataset. In such cases, we can work around this by setting data equal to the dataset used in fitting the model, or a suitable subset. Only the complete cases in data are used, so it may be necessary to exclude some unused variables. Using data can also help save computing, especially when the dataset is large. In any case, data must represent all factor levels used in fitting the model. It cannot be used as an alternative to at. (Note: If there is a pattern of NAs that caused one or more factor levels to be excluded when fitting the model, then data should also exclude those levels.)

Covariance matrix. By default, the variance-covariance matrix for the fixed effects is obtained from object, usually via its vcov method. However, the user may override this via a vcov. argument, specifying a matrix or a function. If a matrix, it must be square and of the same dimension and parameter order of the fixed effects. If a function, must return a suitable matrix when it is called with arguments (object, ...). Be careful with possible unintended conflicts with arguments in ...; for example, sandwich::vcovHAC() has optional arguments adjust and weights that may be intended for emmeans() but will also be passed to vcov.().

Nested factors. Having a nesting structure affects marginal averaging in emmeans in that it is done separately for each level (or combination thereof) of the grouping factors. ref_grid tries to discern which factors are nested in other factors, but it is not always obvious, and if it misses some, the user must specify this structure via nesting; or later using update.emmGrid. The nesting argument may be a character vector, a named list, or NULL. If a list, each name should be the name of a single factor in the grid, and its entry a character vector of the name(s) of its grouping factor(s). nested may also be a character value of the form "factor1 %in% (factor2*factor3)" (the parentheses are optional). If there is more than one such specification, they may be appended separated by commas, or as separate elements of a character vector. For example, these specifications are equivalent: nesting = list(state = "country", city = c("state", "country"), nesting = "state %in% country, city %in% (state*country)", and nesting = c("state %in% country", "city %in% state*country").

Predictors with subscripts and data-set references

When the fitted model contains subscripts or explicit references to data sets, the reference grid may optionally be post-processed to simplify the variable names, depending on the simplify.names option (see emm_options), which by default is TRUE. For example, if the model formula is data1$resp ~ data1$trt + data2[[3]] + data2[["cov"]], the simplified predictor names (for use, e.g., in the specs for emmeans) will be trt, data2[[3]], and cov. Numerical subscripts are not simplified; nor are variables having simplified names that coincide, such as if data2$trt were also in the model.

Please note that this simplification is performed after the reference grid is constructed. Thus, non-simplified names must be used in the at argument (e.g., at = list(`data2["cov"]` = 2:4).

If you don't want names simplified, use emm_options(simplify.names = FALSE).

Prediction types and transformations

Transformations can exist because of a link function in a generalized linear model, or as a response transformation, or even both. In many cases, they are auto-detected, for example a model formula of the form sqrt(y) ~ .... Even transformations containing multiplicative or additive constants, such as 2*sqrt(y + pi) ~ ..., are auto-detected. A response transformation of y + 1 ~ ... is not auto-detected, but I(y + 1) ~ ... is interpreted as identity(y + 1) ~ .... A warning is issued if it gets too complicated. Complex transformations like the Box-Cox transformation are not auto-detected; but see the help page for make.tran for information on some advanced methods.

There is a subtle difference between specifying ‘⁠type = "response"⁠’ and ‘⁠regrid = "response"⁠’. While the summary statistics for the grid itself are the same, subsequent use in emmeans will yield different results if there is a response transformation or link function. With ‘⁠type = "response"⁠’, EMMs are computed by averaging together predictions on the linear-predictor scale and then back-transforming to the response scale; while with ‘⁠regrid = "response"⁠’, the predictions are already on the response scale so that the EMMs will be the arithmetic means of those response-scale predictions. To add further to the possibilities, geometric means of the response-scale predictions are obtainable via ‘⁠regrid = "log", type = "response"⁠’. See also the help page for regrid.

Order-of-processing issues: The regrid argument, if present, is acted on immediately after the reference grid is constructed, while some of the ... arguments may be used to update the object at the very end. Thus, code like ref_grid(mod, tran = "sqrt", regrid = "response") will not work correctly if the intention was to specify the response transformation, because the re-grid is done before it processes tran = "sqrt". To get the intended result, do regrid(ref_grid(mod, tran = "sqrt"), transform = "response").

Counterfactuals

If counterfactuals is specified, the rows of the entire dataset become part of the reference grid, and the other reference levels are confined to those named in counterfactuals. In this type of analysis (called G-computation), we substitute (or impute) each combination of counterfactual levels into the entire dataset. Thus, predictions from this grid are those of each observation under each of the counterfactual levels. For this to make sense, we require an assumption of exchangeability of these levels.

This grid is always converted to the response scale, as G-computation on the linear-predictor scale produces the same results as ordinary weighted EMMs. If we have counterfactual factors A, B, the reference grid also includes factors actual_A, actual_B which are used to track which observations originally had the A, B levels before they were changed by the counterfactuals code. We average the response-scale predictions for each combination of actual levels and imputed levels (and multivariate levels, if any). See additional discussion of how emmeans handles counterfactuals under that documentation.

Currently, counterfactuals are not supported when the reference grid requires post-processing (e.g., ordinal models with mode = "prob"). Cases where we have nested factor levels can be complicated if mixed-in with counterfactuals, and we make no guarantees. Note that past implementations included arguments wt.counter and avg.counter, which are now deprecated and are just ignored if specified.

Optional side effect

If the save.ref_grid option is set to TRUE (see emm_options), The most recent result of ref_grid, whether called directly or indirectly via emmeans, emtrends, or some other function that calls one of these, is saved in the user's environment as .Last.ref_grid. This facilitates checking what reference grid was used, or reusing the same reference grid for further calculations. This automatic saving is disabled by default, but may be enabled via ‘⁠emm_options(save.ref_grid = TRUE)⁠’.

Note

The system default for cov.keep causes models containing indicator variables to be handled differently than in emmeans version 1.4.1 or earlier. To replicate older analyses, change the default via ‘⁠emm_options(cov.keep = character(0))⁠’.

Some earlier versions of emmeans offer a covnest argument. This is now obsolete; if covnest is specified, it is harmlessly ignored. Cases where it was needed are now handled appropriately via the code associated with cov.keep.

Examples

fiber.lm <- lm(strength ~ machine*diameter, data = fiber)
ref_grid(fiber.lm)

ref_grid(fiber.lm, at = list(diameter = c(15, 25)))

## Not run: 
# We could substitute the sandwich estimator vcovHAC(fiber.lm)
# as follows:
summary(ref_grid(fiber.lm, vcov. = sandwich::vcovHAC))

## End(Not run)

# If we thought that the machines affect the diameters
# (admittedly not plausible in this example), then we should use:
ref_grid(fiber.lm, cov.reduce = diameter ~ machine)

### Model with indicator variables as predictors:
mtcars.lm <- lm(mpg ~ disp + wt + vs * am, data = mtcars)
(rg.default <- ref_grid(mtcars.lm))
(rg.nokeep <- ref_grid(mtcars.lm, cov.keep = character(0)))
(rg.at <- ref_grid(mtcars.lm, at = list(vs = 0:1, am = 0:1)))

# Two of these have the same grid but different weights:
rg.default@grid
rg.at@grid

### Using cov.reduce formulas...
# Above suggests we can vary disp indep. of other factors - unrealistic
rg.alt <- ref_grid(mtcars.lm, at = list(wt = c(2.5, 3, 3.5)),
    cov.reduce = disp ~ vs * wt)
rg.alt@grid

# Alternative to above where we model sqrt(disp)
disp.mod <- lm(sqrt(disp) ~ vs * wt, data = mtcars)
disp.fun <- function(dat)
    list(disp = predict(disp.mod, newdata = dat)^2)
rg.alt2 <- ref_grid(mtcars.lm, at = list(wt = c(2.5, 3, 3.5)),
    cov.reduce = external ~ disp.fun)
rg.alt2@grid


# Multivariate example
MOats.lm = lm(yield ~ Block + Variety, data = MOats)
ref_grid(MOats.lm, mult.names = "nitro")
# Silly illustration of how to use 'mult.levs' to make comb's of two factors
ref_grid(MOats.lm, mult.levs = list(T=LETTERS[1:2], U=letters[1:2]))

# Comparing estimates with and without counterfactuals
neuralgia.glm <- glm(Pain ~ Treatment + Sex + Age + Duration, 
                     family = binomial(), data = neuralgia)
emmeans(neuralgia.glm, "Treatment", type = "response")

emmeans(neuralgia.glm, "Treatment", counterfactuals = "Treatment")


# Using 'params'
require("splines")
my.knots = c(2.5, 3, 3.5)
mod = lm(Sepal.Length ~ Species * ns(Sepal.Width, knots = my.knots), data = iris)
## my.knots is not a predictor, so need to name it in 'params'
ref_grid(mod, params = "my.knots")

emmeans documentation built on Aug. 28, 2025, 1:09 a.m.

emmeans index

Package overview README.md Basics of estimated marginal means" Comparisons and contrasts in emmeans" Confidence intervals and tests in emmeans" Explanations supplement" FAQs for emmeans" For developers: Extending **emmeans**" Index of vignette topics" Interaction analysis in emmeans" Models supported by emmeans" Prediction in **emmeans**" Quick start guide for **emmeans**" Sophisticated models in emmeans" Transformations and link functions in emmeans" Utilities and options for emmeans" Working with messy data"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

emmeans
Estimated Marginal Means, aka Least-Squares Means

ref_grid: Create a reference grid from a fitted model
In emmeans: Estimated Marginal Means, aka Least-Squares Means

Create a reference grid from a fitted model

Description

Usage

Arguments

Details

Value

Using `cov.reduce` and `cov.keep`

Interdependent covariates

Matrix covariates

Recovering or overriding model information

Predictors with subscripts and data-set references

Prediction types and transformations

Counterfactuals

Optional side effect

Note

See Also

Examples

Related to ref_grid in emmeans...

R Package Documentation

Browse R Packages

We want your feedback!

emmeans Estimated Marginal Means, aka Least-Squares Means

ref_grid: Create a reference grid from a fitted model In emmeans: Estimated Marginal Means, aka Least-Squares Means

Create a reference grid from a fitted model

Description

Usage

Arguments

Details

Value

Using cov.reduce and cov.keep

Interdependent covariates

Matrix covariates

Recovering or overriding model information

Predictors with subscripts and data-set references

Prediction types and transformations

Counterfactuals

Optional side effect

Note

See Also

Examples

Related to ref_grid in emmeans...

R Package Documentation

Browse R Packages

We want your feedback!

emmeans
Estimated Marginal Means, aka Least-Squares Means

ref_grid: Create a reference grid from a fitted model
In emmeans: Estimated Marginal Means, aka Least-Squares Means

Using `cov.reduce` and `cov.keep`