R/slopes.R

Defines functions avg_slopes slopes

Documented in avg_slopes slopes

#' Slopes (aka Partial derivatives, Marginal Effects, or Trends)
#'
#' @description
#' Partial derivative of the regression equation with respect to a regressor of interest.
#'
#' * `slopes()`: unit-level (conditional) estimates.
#' * `avg_slopes()`: average (marginal) estimates.
#'
#' The `newdata` argument and the `datagrid()` function can be used to control where statistics are evaluated in the predictor space: "at observed values", "at the mean", "at representative values", etc.
#'
#' See the slopes vignette and package website for worked examples and case studies:
#'
#' * [https://marginaleffects.com/chapters/slopes.html](https://marginaleffects.com/chapters/slopes.html)
#' * [https://marginaleffects.com/](https://marginaleffects.com/)
#'
#' @details
#' A "slope" or "marginal effect" is the partial derivative of the regression equation
#' with respect to a variable in the model. This function uses automatic
#' differentiation to compute slopes for a vast array of models,
#' including non-linear models with transformations (e.g., polynomials).
#' Uncertainty estimates are computed using the delta method.
#'
#' @param model Model object
#' @param variables Focal variables
#' * `NULL`: compute slopes or comparisons for all the variables in the model object (can be slow).
#' * Character vector: subset of variables (usually faster).
#' @param newdata Grid of predictor values at which we evaluate the slopes.
#' + Warning: Please avoid modifying your dataset between fitting the model and calling a `marginaleffects` function. This can sometimes lead to unexpected results.
#' + `NULL` (default): Unit-level slopes for each observed value in the dataset (empirical distribution). The dataset is retrieved using [insight::get_data()], which tries to extract data from the environment. This may produce unexpected results if the original data frame has been altered since fitting the model.
#' + [datagrid()] call to specify a custom grid of regressors. For example:
#'   - `newdata = datagrid(cyl = c(4, 6))`: `cyl` variable equal to 4 and 6 and other regressors fixed at their means or modes.
#'   - See the Examples section and the [datagrid()] documentation.
#' + [subset()] call with a single argument to select a subset of the dataset used to fit the model, ex: `newdata = subset(treatment == 1)`
#' + [dplyr::filter()] call with a single argument to select a subset of the dataset used to fit the model, ex: `newdata = filter(treatment == 1)`
#' + string:
#'   - "mean": Slopes evaluated when each predictor is held at its mean or mode.
#'   - "median": Slopes evaluated when each predictor is held at its median or mode.
#'   - "balanced": Slopes evaluated on a balanced grid with every combination of categories and numeric variables held at their means.
#'   - "tukey": Slopes evaluated at Tukey's 5 numbers.
#'   - "grid": Slopes evaluated on a grid of representative numbers (Tukey's 5 numbers and unique values of categorical predictors).
#' @param vcov Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
#'  * FALSE: Do not compute standard errors. This can speed up computation considerably.
#'  * TRUE: Unit-level standard errors using the default `vcov(model)` variance-covariance matrix.
#'  * String which indicates the kind of uncertainty estimates to return.
#'    - Heteroskedasticity-consistent: `"HC"`, `"HC0"`, `"HC1"`, `"HC2"`, `"HC3"`, `"HC4"`, `"HC4m"`, `"HC5"`. See `?sandwich::vcovHC`
#'    - Heteroskedasticity and autocorrelation consistent: `"HAC"`
#'    - Mixed-Models degrees of freedom: "satterthwaite", "kenward-roger"
#'    - Other: `"NeweyWest"`, `"KernHAC"`, `"OPG"`. See the `sandwich` package documentation.
#'    - "rsample", "boot", "fwb", and "simulation" are passed to the `method` argument of the `inferences()` function. To customize the bootstrap or simulation process, call `inferences()` directly.
#'
#'  * One-sided formula which indicates the name of cluster variables (e.g., `~unit_id`). This formula is passed to the `cluster` argument of the `sandwich::vcovCL` function.
#'  * Square covariance matrix
#'  * Function which returns a covariance matrix (e.g., `stats::vcov(model)`)
#' @param conf_level numeric value between 0 and 1. Confidence level to use to build a confidence interval.
#' @param type string indicates the type (scale) of the predictions used to
#' compute contrasts or slopes. This can differ based on the model
#' type, but will typically be a string such as: "response", "link", "probs",
#' or "zero". When an unsupported string is entered, the model-specific list of
#' acceptable values is returned in an error message. When `type` is `NULL`, the
#' first entry in the error message is used by default.
#' @param slope string indicates the type of slope or (semi-)elasticity to compute:
#' - "dydx": dY/dX
#' - "eyex": dY/dX * Y / X
#' - "eydx": dY/dX * Y
#' - "dyex": dY/dX / X
#' - Y is the predicted value of the outcome; X is the observed value of the predictor.
#' @param wts logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in `avg_*()` or with the `by` argument, and not unit-level estimates. See `?weighted.mean`
#' + string: column name of the weights variable in `newdata`. When supplying a column name to `wts`, it is recommended to supply the original data (including the weights variable) explicitly to `newdata`.
#' + numeric: vector of length equal to the number of rows in the original data or in `newdata` (if supplied).
#' + FALSE: Equal weights.
#' + TRUE: Extract weights from the fitted object with `insight::find_weights()` and use them when taking weighted averages of estimates. Warning: `newdata=datagrid()` returns a single average weight, which is equivalent to using `wts=FALSE`
#' @param hypothesis specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.
#' + Number: The null hypothesis used in the computation of Z and p (before applying `transform`).
#' + String: Equation to specify linear or non-linear hypothesis tests. Two-tailed tests must include an equal `=` sign. One-tailed tests must start with `<` or `>`. If the terms in `coef(object)` uniquely identify estimates, they can be used in the formula. Otherwise, use `b1`, `b2`, etc. to identify the position of each parameter. The `b*` wildcard can be used to test hypotheses on all estimates. When the hypothesis string represents a two-sided equation, the `estimate` column holds the value of the left side minus the right side of the equation. If a named vector is used, the names are used as labels in the output. Examples:
#'   - `hp = drat`
#'   - `hp + drat = 12`
#'   - `b1 + b2 + b3 = 0`
#'   - `b* / b1 = 1`
#'   - `<= 0`
#'   - `>= -3.5`
#'   - `b1 >= 10`
#' + Formula: `lhs ~ rhs | group`
#'   + `lhs`
#'     - `ratio` (null = 1)
#'     - `difference` (null = 0)
#'     - Leave empty for default value
#'   + `rhs`
#'     - `pairwise` and `revpairwise`: pairwise differences between estimates in each row.
#'     - `reference`: differences between the estimates in each row and the estimate in the first row.
#'     - `sequential`: difference between an estimate and the estimate in the next row.
#'     - `meandev`: difference between an estimate and the mean of all estimates.
#'     - `meanotherdev: difference between an estimate and the mean of all other estimates, excluding the current one.
#'     - `poly`: polynomial contrasts, as computed by the `stats::contr.poly()` function.
#'     - `helmert`: Helmert contrasts, as computed by the `stats::contr.helmert()` function. Contrast 2nd level to the first, 3rd to the average of the first two, and so on.
#'     - `trt_vs_ctrl`: difference between the mean of estimates (except the first) and the first estimate.
#'     - `I(fun(x))`: custom function to manipulate the vector of estimates `x`. The function `fun()` can return multiple (potentially named) estimates.
#'   + `group` (optional)
#'     - Column name of `newdata`. Conduct hypothesis tests withing subsets of the data.
#'   + Examples:
#'      + `~ poly`
#'      + `~ sequential | groupid`
#'      + `~ reference`
#'      + `ratio ~ pairwise`
#'      + `difference ~ pairwise | groupid`
#'      + `~ I(x - mean(x)) | groupid`
#'      + `~ I(\(x) c(a = x[1], b = mean(x[2:3]))) | groupid`
#' + Matrix or Vector: Each column is a vector of weights. The the output is the dot product between these vectors of weights and the vector of estimates. The matrix can have column names to label the estimates.
#' + Function:
#'   - Accepts an argument `x`: object produced by a `marginaleffects` function or a data frame with column `rowid` and `estimate`
#'   - Returns a data frame with columns `term` and `estimate` (mandatory) and `rowid` (optional).
#'   - The function can also accept optional input arguments: `newdata`, `by`, `draws`.
#'   - This function approach will not work for Bayesian models or with bootstrapping. In those cases, it is easy to use `get_draws()` to extract and manipulate the draws directly.
#' + See the Examples section below and the vignette: [https://marginaleffects.com/chapters/hypothesis.html](https://marginaleffects.com/chapters/hypothesis.html)
#' + Warning: When calling `predictions()` with `type="invlink(link)"` (the default in some models), `hypothesis` is tested and p values are computed on the link scale.
#' @param df Degrees of freedom used to compute p values and confidence intervals.
#'   - A single numeric value between 1 and `Inf`, or a numeric vector with length equal to the number of rows in the output. When `df` is `Inf`, the normal distribution is used. When `df` is finite, the `t` distribution is used.
#'   - "residual": Calls [insight::get_df] to extract degrees of freedom from the model automatically.
#'   - "satterthwaite" or "kenward-roger": Use the Satterthwaite or Kenward-Roger approximation to compute degrees of freedom in mixed effects models.
#' @param eps NULL or numeric value which determines the step size to use when
#' calculating numerical derivatives: (f(x+eps)-f(x))/eps. When `eps` is
#' `NULL`, the step size is 0.0001 multiplied by the difference between
#' the maximum and minimum values of the variable with respect to which we
#' are taking the derivative. Changing `eps` may be necessary to avoid
#' numerical problems in certain models.
#' @param numderiv string or list of strings indicating the method to use to for the numeric differentiation used in to compute delta method standard errors.
#' + "fdforward": finite difference method with forward differences
#' + "fdcenter": finite difference method with central differences (default)
#' + "richardson": Richardson extrapolation method
#' + Extra arguments can be specified by passing a list to the `numDeriv` argument, with the name of the method first and named arguments following, ex: `numderiv=list("fdcenter", eps = 1e-5)`. When an unknown argument is used, `marginaleffects` prints the list of valid arguments for each method.
#' @param ... Additional arguments are passed to the `predict()` method
#' supplied by the modeling package.These arguments are particularly useful
#' for mixed-effects or bayesian models (see the online vignettes on the
#' `marginaleffects` website). Available arguments can vary from model to
#' model, depending on the range of supported arguments by each modeling
#' package. See the "Model-Specific Arguments" section of the
#' `?slopes` documentation for a non-exhaustive list of available
#' arguments.
#' @inheritParams comparisons
#'
#' @details
#' Numerical derivatives for the `slopes` function are calculated
#' using a simple epsilon difference approach: \eqn{\partial Y / \partial X = (f(X + \varepsilon/2) - f(X-\varepsilon/2)) / \varepsilon}{dY/dX = (f(X + e/2) - f(X-e/2)) / e},
#' where f is the `predict()` method associated with the model class, and
#' \eqn{\varepsilon}{e} is determined by the `eps` argument.
#' @template references
#' @template deltamethod
#' @template model_specific_arguments
#' @template bayesian
#' @template equivalence
#' @template type
#' @template parallel
#' @template order_of_operations
#' @template options
#' @template return
#' @examplesIf interactive() || isTRUE(Sys.getenv("R_DOC_BUILD") == "true")
#' # Unit-level (conditional) Marginal Effects
#' mod <- glm(am ~ hp * wt, data = mtcars, family = binomial)
#' mfx <- slopes(mod)
#' head(mfx)
#'
#' # Average Marginal Effect (AME)
#' avg_slopes(mod, by = TRUE)
#'
#'
#' # Marginal Effect at the Mean (MEM)
#' slopes(mod, newdata = datagrid())
#'
#' # Marginal Effect at User-Specified Values
#' # Variables not explicitly included in `datagrid()` are held at their means
#' slopes(mod, newdata = datagrid(hp = c(100, 110)))
#'
#' # Group-Average Marginal Effects (G-AME)
#' # Calculate marginal effects for each observation, and then take the average
#' # marginal effect within each subset of observations with different observed
#' # values for the `cyl` variable:
#' mod2 <- lm(mpg ~ hp * cyl, data = mtcars)
#' avg_slopes(mod2, variables = "hp", by = "cyl")
#'
#' # Marginal Effects at User-Specified Values (counterfactual)
#' # Variables not explicitly included in `datagrid()` are held at their
#' # original values, and the whole dataset is duplicated once for each
#' # combination of the values in `datagrid()`
#' mfx <- slopes(mod,
#'   newdata = datagrid(
#'     hp = c(100, 110),
#'     grid_type = "counterfactual"))
#' head(mfx)
#'
#' # Heteroskedasticity robust standard errors
#' mfx <- slopes(mod, vcov = sandwich::vcovHC(mod))
#' head(mfx)
#'
#' # hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect
#' mod <- lm(mpg ~ wt + drat, data = mtcars)
#'
#' slopes(
#'   mod,
#'   newdata = "mean",
#'   hypothesis = "wt = drat")
#'
#' # same hypothesis test using row indices
#' slopes(
#'   mod,
#'   newdata = "mean",
#'   hypothesis = "b1 - b2 = 0")
#'
#' # same hypothesis test using numeric vector of weights
#' slopes(
#'   mod,
#'   newdata = "mean",
#'   hypothesis = c(1, -1))
#'
#' # two custom contrasts using a matrix of weights
#' lc <- matrix(
#'   c(
#'     1, -1,
#'     2, 3),
#'   ncol = 2)
#' colnames(lc) <- c("Contrast A", "Contrast B")
#' slopes(
#'   mod,
#'   newdata = "mean",
#'   hypothesis = lc)
#'
#' @export
slopes <- function(
    model,
    newdata = NULL,
    variables = NULL,
    type = NULL,
    by = FALSE,
    vcov = TRUE,
    conf_level = 0.95,
    slope = "dydx",
    wts = FALSE,
    hypothesis = NULL,
    equivalence = NULL,
    df = Inf,
    eps = NULL,
    numderiv = "fdforward",
    ...
) {
    call_attr <- construct_call(model, "slopes")

    # very early, before any use of newdata
    # if `newdata` is a call to `typical` or `counterfactual`, insert `model`
    # scall <- rlang::enquo(newdata)
    # newdata <- sanitize_newdata_call(scall, newdata, model, by = by)

    # build call: match.call() doesn't work well in *apply()
    # call_attr <- c(
    #     list(
    #         name = "slopes",
    #         model = model,
    #         newdata = newdata,
    #         variables = variables,
    #         type = type,
    #         vcov = vcov,
    #         by = by,
    #         conf_level = conf_level,
    #         slope = slope,
    #         wts = wts,
    #         hypothesis = hypothesis,
    #         df = df,
    #         eps = eps),
    #     list(...))
    # call_attr <- do.call("call", call_attr)

    # slopes() does not support a named list of variables like comparisons()
    checkmate::assert_character(variables, null.ok = TRUE)

    # slope
    valid <- c(
        "dydx",
        "eyex",
        "eydx",
        "dyex",
        "dydxavg",
        "eyexavg",
        "eydxavg",
        "dyexavg"
    )
    checkmate::assert_choice(slope, choices = valid)

    # sanity checks and pre-processing
    model <- sanitize_model(
        model = model,
        wts = wts,
        vcov = vcov,
        by = by,
        calling_function = "slopes",
        ...
    )
    sanity_dots(model = model, calling_function = "slopes", ...)

    ############### sanity checks are over

    call_attr_c <- call_attr
    call_attr_c[[1L]] <- quote(marginaleffects::comparisons)
    call_attr_c[["model"]] <- model
    call_attr_c[["type"]] <- type
    call_attr_c[["comparison"]] <- slope
    call_attr_c[["cross"]] <- FALSE
    call_attr_c[["internal_call"]] <- TRUE
    call_attr_c[["slope"]] <- NULL

    # out <- comparisons(
    #     model,
    #     newdata = newdata,
    #     variables = variables,
    #     vcov = vcov,
    #     conf_level = conf_level,
    #     type = type,
    #     wts = wts,
    #     hypothesis = hypothesis,
    #     equivalence = equivalence,
    #     df = df,
    #     by = by,
    #     eps = eps,
    #     numderiv = numderiv,
    #     comparison = slope,
    #     cross = FALSE,
    #     # secret arguments
    #     internal_call = TRUE,
    #     ...)

    out <- eval.parent(call_attr_c)

    if (!is.null(attr(out, "call"))) {
        attr(out, "call") <- call_attr
    }

    # class
    data.table::setDF(out)
    class(out) <- setdiff(class(out), "comparisons")
    class(out) <- c("slopes", "marginaleffects", class(out))
    return(out)
}


#' Average slopes (aka Average partial derivatives, marginal effects, or trends)
#' @describeIn slopes Average slopes
#' @export
#'
avg_slopes <- function(
    model,
    newdata = NULL,
    variables = NULL,
    type = NULL,
    by = TRUE,
    vcov = TRUE,
    conf_level = 0.95,
    slope = "dydx",
    wts = FALSE,
    hypothesis = NULL,
    equivalence = NULL,
    df = Inf,
    eps = NULL,
    numderiv = "fdforward",
    ...
) {
    # order of the first few paragraphs is important
    # if `newdata` is a call to `typical` or `counterfactual`, insert `model`
    # should probably not be nested too deeply in the call stack since we eval.parent() (not sure about this)
    # scall <- rlang::enquo(newdata)
    # newdata <- sanitize_newdata_call(scall, newdata, model, by = by)

    #Construct comparisons() call
    call_attr <- construct_call(model, "slopes")

    out <- eval.parent(call_attr)

    # out <- slopes(
    #     model = model,
    #     newdata = newdata,
    #     variables = variables,
    #     type = type,
    #     vcov = vcov,
    #     conf_level = conf_level,
    #     by = by,
    #     slope = slope,
    #     wts = wts,
    #     hypothesis = hypothesis,
    #     equivalence = equivalence,
    #     df = df,
    #     eps = eps,
    #     numderiv = numderiv,
    #     ...)

    return(out)
}

Try the marginaleffects package in your browser

Any scripts or data that you put into this service are public.

marginaleffects documentation built on June 8, 2025, 12:44 p.m.