margins: Compute the marginal effects of regression models

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

margins calculates the marginal effects of the variables given the result of regressions (madlib.lm, madlib.glm etc). Vars lists all the variables used in the regression model. Terms lists the specified terms in the original model. Vars and Terms are only used in margins's dydx option.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## S3 method for class 'lm.madlib'
margins(model, dydx = ~Vars(model), newdata =
model$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'lm.madlib.grps'
margins(model, dydx = ~Vars(model), newdata =
lapply(model, function(x) x$data), at.mean = FALSE, factor.continuous =
FALSE, na.action = NULL, ...)

## S3 method for class 'logregr.madlib'
margins(model, dydx = ~Vars(model), newdata =
model$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'logregr.madlib.grps'
margins(model, dydx = ~Vars(model),
newdata = lapply(model, function(x) x$data), at.mean = FALSE,
factor.continuous = FALSE, na.action = NULL, ...)

## S3 method for class 'margins'
print(x, digits = max(3L, getOption("digits") - 3L),
...)

Vars(model)

Terms(term = NULL)

Arguments

model

The result of madlib.lm, madlib.glm, which represents a regression model for the training data.

dydx

A formula, and the default is ~ Vars(model), which tells the function to compute the marginal effects for all the variables that appear in the model. ~ . will compute the marginal effects of all variables in newdata. Use the normal formula to specify which variables' marginal effects are to be computed.

newdata

A db.obj object, which represents the data in the database. The default is the data used to train the regression model, but the user can freely use other data sets.

at.mean

A logical, the default is FALSE. Whether to compute the marginal effects at the mean values of the variables.

factor.continuous

A logical, the default is FALSE. Whether to compute the marginal effects of factors by treating them as continuous variables. See "details" for more explanation.

na.action

A string which indicates what should happen when the data contain NAs. Possible values include na.omit, "na.exclude", "na.fail" and NULL. Right now, na.omit,db.obj-method has been implemented. When the value is NULL, nothing is done on the R side and NA values are filtered out and omitted on the MADlib side. User defined na.action function is allowed, and see na.omit,db.obj-method for the preferred function interface.

...

Other arguments, not implemented.

x

The result of margins function, which is of the class "margins".

digits

A non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The default, ‘NULL’, uses ‘getOption("digits")’. (For the interpretation for complex numbers see signif.) Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted.

term

A vector of integers, the default is NULL. When term=i, compute the marginal effects of the i-th term. Even if this term contains multiple variables, we treat it as a variable independent of all others. When term=NULL, the marginal effects of all terms are calculated. In the final result, margianl effect results for ".term.1", ".term.2" etc will be shown. By comparing with names(model$coef), one can easily figure out which term corresponds to which expression. (Intercept) term's marginal effect cannot be computed using this (One can create an extra column that equals 1 and use it as a variable without using intercept by add -1 into the fitting formula).

Details

For a continuous variable, its marginal effects is just the first derivative of the response function with respect to the variable. For a categorical variable, it is usually more meaningful to compute the finite difference of the response function for the variable being 1 and 0. The finite difference marginal effect measures how much more the response function would be compared with the reference category. The reference category for a categorical variable can be changed by relevel.

Value

margins function returns a margins object, which is a data.frame. It contains the following item:

Estimate

The marginal effect values for all variable that have been specified in dydx.

Std. Error

The standard errors for the marginal effects.

t value, z value

The t statistics (for linear regression) or z statistics (for logistic regression).

Pr(>|t|), Pr(>|z|)

The corresponding p values.

Vars returns a vector of strings, which are the variable names that have been used in the regression model.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

References

[1] Stata 13 help for margins, https://www.stata.com/help.cgi?margins

See Also

relevel changes the reference category.

madlib.lm, madlib.glm compute linear and logistic regressions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname)

## create a data table in database and the R wrapper
delete("abalone", conn.id = cid)
dat <- as.db.data.frame(abalone, "abalone", conn.id = cid)

fit <- madlib.lm(rings ~ length + diameter*sex, data = dat)
margins(fit)
margins(fit, at.mean = TRUE)
margins(fit, factor.continuous = TRUE)
margins(fit, dydx = ~ Vars(model) + Terms())

fit <- madlib.glm(rings < 10 ~ length + diameter*sex, data = dat, family = "logistic")
margins(fit, ~ length + sex)
margins(fit, ~ length + sex.M, at.mean = TRUE)
margins(fit, ~ length + sex.I, factor.continuous = TRUE)
margins(fit, ~ Vars(model) + Terms())

## create a data table that has two columns
## one of them is an array column
dat1 <- cbind(db.array(dat[,-c(1,2,10)]), dat[,10])
names(dat1) <- c("x", "y")
delete("abalone_array", conn.id = cid)
dat1 <- as.db.data.frame(dat1, "abalone_array")

fit <- madlib.glm(y < 10 ~ x[-1], data = dat1, family = "logistic")
margins(fit, ~ x[2:5])

db.disconnect(cid, verbose = FALSE)

## End(Not run)

PivotalR documentation built on March 13, 2021, 1:06 a.m.