margins: Marginal Effects Estimation

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/margins.R

Description

This package is an R port of Stata's margins command, implemented as an S3 generic margins() for model objects, like those of class “lm” and “glm”. margins() is an S3 generic function for building a “margins” object from a model object. Methods are currently implemented for several model classes (see Details, below).

margins provides “marginal effects” summaries of models. Marginal effects are partial derivatives of the regression equation with respect to each variable in the model for each unit in the data; average marginal effects are simply the mean of these unit-specific partial derivatives over some sample. In ordinary least squares regression with no interactions or higher-order term, the estimated slope coefficients are marginal effects. In other cases and for generalized linear models, the coefficients are not marginal effects at least not on the scale of the response variable. margins therefore provides ways of calculating the marginal effects of variables to make these models more interpretable.

The package also provides a low-level function, marginal_effects, to estimate those quantities and return a data frame of unit-specific effects and another even lower-level function, dydx, to provide variable-specific derivatives from models. Some of the underlying architecture for the package is provided by the low-level function prediction, which provides a consistent data frame interface to predict for a large number of model types. If a prediction method exists for a model class, margin should work for the model class but only those classes listed here have been tested and specifically supported.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
margins(model, ...)

## S3 method for class 'betareg'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model, phi = FALSE),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'clm'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vce = "none",
  eps = 1e-07,
  ...
)

## Default S3 method:
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'glm'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'lm'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'loess'
margins(
  model,
  data,
  variables = NULL,
  at = NULL,
  vce = "none",
  eps = 1e-07,
  ...
)

## S3 method for class 'merMod'
margins(
  model,
  data = find_data(model),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'lmerMod'
margins(
  model,
  data = find_data(model),
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'multinom'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = NULL,
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

## S3 method for class 'nnet'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  vce = "none",
  eps = 1e-07,
  ...
)

## S3 method for class 'polr'
margins(
  model,
  data = find_data(model, parent.frame()),
  variables = NULL,
  at = NULL,
  type = NULL,
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

margins_summary(model, ..., level = 0.95, by_factor = TRUE)

## S3 method for class 'svyglm'
margins(
  model,
  data = find_data(model, parent.frame()),
  design,
  variables = NULL,
  at = NULL,
  type = c("response", "link"),
  vcov = stats::vcov(model),
  vce = c("delta", "simulation", "bootstrap", "none"),
  iterations = 50L,
  unit_ses = FALSE,
  eps = 1e-07,
  ...
)

Arguments

model

A model object. See Details for supported model classes.

...

Arguments passed to methods, and onward to dydx methods and possibly further to prediction methods. This can be useful, for example, for setting type (predicted value type), eps (precision), or category (category for multi-category outcome models), etc.

data

A data frame containing the data at which to evaluate the marginal effects, as in predict. This is optional, but may be required when the underlying modelling function sets model = FALSE.

variables

A character vector with the names of variables for which to compute the marginal effects. The default (NULL) returns marginal effects for all variables.

at

A list of one or more named vectors, specifically values at which to calculate the marginal effects. This is an analogue of Stata's , at() option. The specified values are fully combined (i.e., a cartesian product) to find AMEs for all combinations of specified variable values. Rather than a list, this can also be a data frame of combination levels if only a subset of combinations are desired. These are used to modify the value of data when calculating AMEs across specified values (see build_datalist for details on use). Note: This does not calculate AMEs for subgroups but rather for counterfactual datasets where all observaations take the specified values; to obtain subgroup effects, subset data directly.

type

A character string indicating the type of marginal effects to estimate. Mostly relevant for non-linear models, where the reasonable options are “response” (the default) or “link” (i.e., on the scale of the linear predictor in a GLM).

vcov

A matrix containing the variance-covariance matrix for estimated model coefficients, or a function to perform the estimation with model as its only argument.

vce

A character string indicating the type of estimation procedure to use for estimating variances. The default (“delta”) uses the delta method. Alternatives are “bootstrap”, which uses bootstrap estimation, or “simulation”, which averages across simulations drawn from the joint sampling distribution of model coefficients. The latter two are extremely time intensive.

iterations

If vce = "bootstrap", the number of bootstrap iterations. If vce = "simulation", the number of simulated effects to draw. Ignored otherwise.

unit_ses

If vce = "delta", a logical specifying whether to calculate and return unit-specific marginal effect variances. This calculation is time consuming and the information is often not needed, so this is set to FALSE by default.

eps

A numeric value specifying the “step” to use when calculating numerical derivatives.

level

A numeric value specifying the confidence level for calculating p-values and confidence intervals.

by_factor

A logical specifying whether to order the output by factor (the default, TRUE).

design

Only for models estimated using svyglm, the “survey.design” object used to estimate the model. This is required.

Details

Methods for this generic return a “margins” object, which is a data frame consisting of the original data, predicted values and standard errors thereof, estimated marginal effects from the model model (for all variables used in the model, or the subset specified by variables), along with attributes describing various features of the marginal effects estimates.

The default print method is concise; a more useful summary method provides additional details.

margins_summary is sugar that provides a more convenient way of obtaining the nested call: summary(margins(...)).

Methods are currently implemented for the following object classes:

The margins methods simply construct a list of data frames based upon the values of at (using build_datalist), calculate marginal effects for each data frame (via marginal_effects and, in turn, dydx and prediction), stacks the results together, and provides variance estimates. Alternatively, you can use marginal_effects directly to only retrieve a data frame of marginal effects without constructing a “margins” object or variance estimates. That can be efficient for plotting, etc., given the time-consuming nature of variance estimation.

See dydx for details on estimation of marginal effects.

The choice of vce may be important. The default variance-covariance estimation procedure (vce = "delta") uses the delta method to estimate marginal effect variances. This is the fastest method. When vce = "simulation", coefficient estimates are repeatedly drawn from the asymptotic (multivariate normal) distribution of the model coefficients and each draw is used to estimate marginal effects, with the variance based upon the dispersion of those simulated effects. The number of iterations used is given by iterations. For vce = "bootstrap", the bootstrap is used to repeatedly subsample data and the variance of marginal effects is estimated from the variance of the bootstrap distribution. This method is markedly slower than the other two procedures. Again, iterations regulates the number of bootstrap subsamples to draw. Some model classes (notably “loess”) fix vce ="none".

Value

A data frame of class “margins” containing the contents of data, predicted values from model for data, the standard errors of the predictions, and any estimated marginal effects. If at = NULL (the default), then the data frame will have a number of rows equal to nrow(data). Otherwise, the number of rows will be a multiple thereof based upon the number of combinations of values specified in at. Columns containing marginal effects are distinguished by their name (prefixed by dydx_). These columns can be extracted from a “margins” object using, for example, marginal_effects(margins(model)). Columns prefixed by Var_ specify the variances of the average marginal effects, whereas (optional) columns prefixed by SE_ contain observation-specific standard errors. A special column, _at_number, specifies which at combination a given row corresponds to; the data frame carries an attribute “at” that specifies which combination of values this index represents. The summary.margins() method provides for pretty printing of the results, particularly in cases where at is specified. A variance-covariance matrix for the average marginal effects is returned as an attribute (though behavior when at is non-NULL is unspecified).

Author(s)

Thomas J. Leeper

References

Greene, W.H. 2012. Econometric Analysis, 7th Ed. Boston: Pearson.

Stata manual: margins. Retrieved 2014-12-15 from https://www.stata.com/manuals13/rmargins.pdf.

See Also

marginal_effects, dydx, prediction

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# basic example using linear model
require("datasets")
x <- lm(mpg ~ cyl * hp + wt, data = head(mtcars))
margins(x)

# obtain unit-specific standard errors
## Not run: 
  margins(x, unit_ses = TRUE)

## End(Not run)

# use of 'variables' argument to estimate only some MEs
summary(margins(x, variables = "hp"))

# use of 'at' argument
## modifying original data values
margins(x, at = list(hp = 150))
## AMEs at various data values
margins(x, at = list(hp = c(95, 150), cyl = c(4,6)))

# use of 'data' argument to obtain AMEs for a subset of data
margins(x, data = mtcars[mtcars[["cyl"]] == 4,])
margins(x, data = mtcars[mtcars[["cyl"]] == 6,])

# return discrete differences for continuous terms
## passes 'change' through '...' to dydx()
margins(x, change = "sd")

# summary() method
summary(margins(x, at = list(hp = c(95, 150))))
margins_summary(x, at = list(hp = c(95, 150)))
## control row order of summary() output
summary(margins(x, at = list(hp = c(95, 150))), by_factor = FALSE)

# alternative 'vce' estimation
## Not run: 
  # bootstrap
  margins(x, vce = "bootstrap", iterations = 100L)
  # simulation (ala Clarify/Zelig)
  margins(x, vce = "simulation", iterations = 100L)

## End(Not run)

# specifying a custom `vcov` argument
if (require("sandwich")) {
  x2 <- lm(Sepal.Length ~ Sepal.Width, data = head(iris))
  summary(margins(x2))
  ## heteroskedasticity-consistent covariance matrix
  summary(margins(x2, vcov = vcovHC(x2)))
}

# generalized linear model
x <- glm(am ~ hp, data = head(mtcars), family = binomial)
margins(x, type = "response")
margins(x, type = "link")

# multi-category outcome
if (requireNamespace("nnet")) {
  data("iris3", package = "datasets")
  ird <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                    species = factor(c(rep("s",50), rep("c", 50), rep("v", 50))))
  m <- nnet::nnet(species ~ ., data = ird, size = 2, rang = 0.1,
                  decay = 5e-4, maxit = 200, trace = FALSE)
  margins(m) # default
  margins(m, category = "v") # explicit category
}

# using margins_summary() for concise grouped operations
list_data <- split(mtcars, mtcars$gear)
list_mod <- lapply(list_data, function(x) lm(mpg ~ cyl + wt, data = x))
mapply(margins_summary, model = list_mod, data = list_data, SIMPLIFY = FALSE)

margins documentation built on Jan. 22, 2021, 5:09 p.m.