View source: R/estimate_slopes.R
estimate_slopes | R Documentation |
Estimate the slopes (i.e., the coefficient) of a predictor over or within different factor levels, or alongside a numeric variable. In other words, to assess the effect of a predictor at specific configurations data. It corresponds to the derivative and can be useful to understand where a predictor has a significant role when interactions or non-linear relationships are present.
Other related functions based on marginal estimations includes
estimate_contrasts()
and estimate_means()
.
See the Details section below, and don't forget to also check out the Vignettes and README examples for various examples, tutorials and use cases.
estimate_slopes(
model,
trend = NULL,
by = NULL,
predict = NULL,
ci = 0.95,
estimate = NULL,
transform = NULL,
p_adjust = "none",
keep_iterations = FALSE,
backend = NULL,
verbose = TRUE,
...
)
model |
A statistical model. |
trend |
A character indicating the name of the variable for which to
compute the slopes. To get marginal effects at specific values, use
|
by |
The (focal) predictor variable(s) at which to evaluate the desired
effect / mean / contrasts. Other predictors of the model that are not
included here will be collapsed and "averaged" over (the effect will be
estimated across them). |
predict |
Is passed to the
See also section Predictions on different scales. |
ci |
Confidence Interval (CI) level. Default to |
estimate |
The
You can set a default option for the Note following limitations:
|
transform |
A function applied to predictions and confidence intervals
to (back-) transform results, which can be useful in case the regression
model has a transformed response variable (e.g., |
p_adjust |
The p-values adjustment method for frequentist multiple
comparisons. For |
keep_iterations |
If |
backend |
Whether to use Another difference is that You can set a default backend via |
verbose |
Use |
... |
Other arguments passed, for instance, to
|
The estimate_slopes()
, estimate_means()
and estimate_contrasts()
functions are forming a group, as they are all based on marginal
estimations (estimations based on a model). All three are built on the
emmeans or marginaleffects package (depending on the backend
argument), so reading its documentation (for instance emmeans::emmeans()
,
emmeans::emtrends()
or this website) is
recommended to understand the idea behind these types of procedures.
Model-based predictions is the basis for all that follows. Indeed, the
first thing to understand is how models can be used to make predictions
(see estimate_relation()
). This corresponds to the predicted response (or
"outcome variable") given specific predictor values of the predictors
(i.e., given a specific data configuration). This is why the concept of
the reference grid is so important for direct
predictions.
Marginal "means", obtained via estimate_means()
, are an extension of
such predictions, allowing to "average" (collapse) some of the predictors,
to obtain the average response value at a specific predictors
configuration. This is typically used when some of the predictors of
interest are factors. Indeed, the parameters of the model will usually give
you the intercept value and then the "effect" of each factor level (how
different it is from the intercept). Marginal means can be used to directly
give you the mean value of the response variable at all the levels of a
factor. Moreover, it can also be used to control, or average over
predictors, which is useful in the case of multiple predictors with or
without interactions.
Marginal contrasts, obtained via estimate_contrasts()
, are themselves
at extension of marginal means, in that they allow to investigate the
difference (i.e., the contrast) between the marginal means. This is, again,
often used to get all pairwise differences between all levels of a factor.
It works also for continuous predictors, for instance one could also be
interested in whether the difference at two extremes of a continuous
predictor is significant.
Finally, marginal effects, obtained via estimate_slopes()
, are
different in that their focus is not values on the response variable, but
the model's parameters. The idea is to assess the effect of a predictor at
a specific configuration of the other predictors. This is relevant in the
case of interactions or non-linear relationships, when the effect of a
predictor variable changes depending on the other predictors. Moreover,
these effects can also be "averaged" over other predictors, to get for
instance the "general trend" of a predictor over different factor levels.
Example: Let's imagine the following model lm(y ~ condition * x)
where
condition
is a factor with 3 levels A, B and C and x
a continuous
variable (like age for example). One idea is to see how this model performs,
and compare the actual response y to the one predicted by the model (using
estimate_expectation()
). Another idea is evaluate the average mean at each of
the condition's levels (using estimate_means()
), which can be useful to
visualize them. Another possibility is to evaluate the difference between
these levels (using estimate_contrasts()
). Finally, one could also estimate
the effect of x averaged over all conditions, or instead within each
condition (using estimate_slopes()
).
A data.frame of class estimate_slopes
.
To define representative values for focal predictors (specified in by
,
contrast
, and trend
), you can use several methods. These values are
internally generated by insight::get_datagrid()
, so consult its
documentation for more details.
You can directly specify values as strings or lists for by
, contrast
,
and trend
.
For numeric focal predictors, use examples like by = "gear = c(4, 8)"
,
by = list(gear = c(4, 8))
or by = "gear = 5:10"
For factor or character predictors, use by = "Species = c('setosa', 'virginica')"
or by = list(Species = c('setosa', 'virginica'))
You can use "shortcuts" within square brackets, such as by = "Sepal.Width = [sd]"
or by = "Sepal.Width = [fivenum]"
For numeric focal predictors, if no representative values are specified
(i.e., by = "gear"
and not by = "gear = c(4, 8)"
), length
and
range
control the number and type of representative values for the focal
predictors:
length
determines how many equally spaced values are generated.
range
specifies the type of values, like "range"
or "sd"
.
length
and range
apply to all numeric focal predictors.
If you have multiple numeric predictors, length
and range
can accept
multiple elements, one for each predictor (see 'Examples').
For integer variables, only values that appear in the data will be included
in the data grid, independent from the length
argument. This behaviour
can be changed by setting protect_integers = FALSE
, which will then treat
integer variables as numerics (and possibly produce fractions).
See also this vignette for some examples.
Montiel Olea, J. L., and Plagborg-Møller, M. (2019). Simultaneous confidence bands: Theory, implementation, and an application to SVARs. Journal of Applied Econometrics, 34(1), 1–17. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/jae.2656")}
library(ggplot2)
# Get an idea of the data
ggplot(iris, aes(x = Petal.Length, y = Sepal.Width)) +
geom_point(aes(color = Species)) +
geom_smooth(color = "black", se = FALSE) +
geom_smooth(aes(color = Species), linetype = "dotted", se = FALSE) +
geom_smooth(aes(color = Species), method = "lm", se = FALSE)
# Model it
model <- lm(Sepal.Width ~ Species * Petal.Length, data = iris)
# Compute the marginal effect of Petal.Length at each level of Species
slopes <- estimate_slopes(model, trend = "Petal.Length", by = "Species")
slopes
# What is the *average* slope of Petal.Length? This can be calculated by
# taking the average of the slopes across all Species, using `comparison`.
# We pass a function to `comparison` that calculates the mean of the slopes.
estimate_slopes(
model,
trend = "Petal.Length",
by = "Species",
comparison = ~I(mean(x))
)
## Not run:
# Plot it
plot(slopes)
standardize(slopes)
model <- mgcv::gam(Sepal.Width ~ s(Petal.Length), data = iris)
slopes <- estimate_slopes(model, by = "Petal.Length", length = 50)
summary(slopes)
plot(slopes)
model <- mgcv::gam(Sepal.Width ~ s(Petal.Length, by = Species), data = iris)
slopes <- estimate_slopes(model,
trend = "Petal.Length",
by = c("Petal.Length", "Species"), length = 20
)
summary(slopes)
plot(slopes)
# marginal effects, grouped by Species, at different values of Petal.Length
estimate_slopes(model,
trend = "Petal.Length",
by = c("Petal.Length", "Species"), length = 10
)
# marginal effects at different values of Petal.Length
estimate_slopes(model, trend = "Petal.Length", by = "Petal.Length", length = 10)
# marginal effects at very specific values of Petal.Length
estimate_slopes(model, trend = "Petal.Length", by = "Petal.Length=c(1, 3, 5)")
# average marginal effects of Petal.Length,
# just for the trend within a certain range
estimate_slopes(model, trend = "Petal.Length=seq(2, 4, 0.01)")
## End(Not run)
## Not run:
# marginal effects with different `estimate` options
data(penguins)
penguins$long_bill <- factor(datawizard::categorize(penguins$bill_len), labels = c("short", "long"))
m <- glm(long_bill ~ sex + species + island * bill_dep, data = penguins, family = "binomial")
# the emmeans default
estimate_slopes(m, "bill_dep", by = "island")
emmeans::emtrends(m, "island", var = "bill_dep", regrid = "response")
# the marginaleffects default
estimate_slopes(m, "bill_dep", by = "island", estimate = "average")
marginaleffects::avg_slopes(m, variables = "bill_dep", by = "island")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.