margins: Compute the marginal effects of regression models In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description

`margins` calculates the marginal effects of the variables given the result of regressions (`madlib.lm`, `madlib.glm` etc). `Vars` lists all the variables used in the regression model. `Terms` lists the specified terms in the original model. `Vars` and `Terms` are only used in `margins`'s `dydx` option.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27``` ```## S3 method for class 'lm.madlib' margins(model, dydx = ~Vars(model), newdata = model\$data, at.mean = FALSE, factor.continuous = FALSE, na.action = NULL, ...) ## S3 method for class 'lm.madlib.grps' margins(model, dydx = ~Vars(model), newdata = lapply(model, function(x) x\$data), at.mean = FALSE, factor.continuous = FALSE, na.action = NULL, ...) ## S3 method for class 'logregr.madlib' margins(model, dydx = ~Vars(model), newdata = model\$data, at.mean = FALSE, factor.continuous = FALSE, na.action = NULL, ...) ## S3 method for class 'logregr.madlib.grps' margins(model, dydx = ~Vars(model), newdata = lapply(model, function(x) x\$data), at.mean = FALSE, factor.continuous = FALSE, na.action = NULL, ...) ## S3 method for class 'margins' print(x, digits = max(3L, getOption("digits") - 3L), ...) Vars(model) Terms(term = NULL) ```

Arguments

 `model` The result of `madlib.lm`, `madlib.glm`, which represents a regression model for the training data. `dydx` A formula, and the default is `~ Vars(model)`, which tells the function to compute the marginal effects for all the variables that appear in the model. `~ .` will compute the marginal effects of all variables in `newdata`. Use the normal formula to specify which variables' marginal effects are to be computed. `newdata` A `db.obj` object, which represents the data in the database. The default is the data used to train the regression model, but the user can freely use other data sets. `at.mean` A logical, the default is `FALSE`. Whether to compute the marginal effects at the mean values of the variables. `factor.continuous` A logical, the default is `FALSE`. Whether to compute the marginal effects of factors by treating them as continuous variables. See "details" for more explanation. `na.action` A string which indicates what should happen when the data contain `NA`s. Possible values include `na.omit`, `"na.exclude"`, `"na.fail"` and `NULL`. Right now, `na.omit,db.obj-method` has been implemented. When the value is `NULL`, nothing is done on the R side and `NA` values are filtered out and omitted on the MADlib side. User defined `na.action` function is allowed, and see `na.omit,db.obj-method` for the preferred function interface. `...` Other arguments, not implemented. `x` The result of `margins` function, which is of the class "margins". `digits` A non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The default, ‘NULL’, uses ‘getOption("digits")’. (For the interpretation for complex numbers see `signif`.) Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted. `term` A vector of integers, the default is `NULL`. When `term=i`, compute the marginal effects of the i-th term. Even if this term contains multiple variables, we treat it as a variable independent of all others. When `term=NULL`, the marginal effects of all terms are calculated. In the final result, margianl effect results for `".term.1"`, `".term.2"` etc will be shown. By comparing with `names(model\$coef)`, one can easily figure out which term corresponds to which expression. `(Intercept)` term's marginal effect cannot be computed using this (One can create an extra column that equals 1 and use it as a variable without using intercept by add -1 into the fitting formula).

Details

For a continuous variable, its marginal effects is just the first derivative of the response function with respect to the variable. For a categorical variable, it is usually more meaningful to compute the finite difference of the response function for the variable being 1 and 0. The finite difference marginal effect measures how much more the response function would be compared with the reference category. The reference category for a categorical variable can be changed by `relevel`.

Value

`margins` function returns a `margins` object, which is a `data.frame`. It contains the following item:

 `Estimate` The marginal effect values for all variable that have been specified in `dydx`. `Std. Error` The standard errors for the marginal effects. `t value, z value` The t statistics (for linear regression) or z statistics (for logistic regression). `Pr(>|t|), Pr(>|z|)` The corresponding p values.

`Vars` returns a vector of strings, which are the variable names that have been used in the regression model.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. [email protected]

References

[1] Stata 13 help for margins, http://www.stata.com/help.cgi?margins

`relevel` changes the reference category.
`madlib.lm`, `madlib.glm` compute linear and logistic regressions.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```## Not run: ## set up the database connection ## Assume that .port is port number and .dbname is the database name cid <- db.connect(port = .port, dbname = .dbname) ## create a data table in database and the R wrapper delete("abalone", conn.id = cid) dat <- as.db.data.frame(abalone, "abalone", conn.id = cid) fit <- madlib.lm(rings ~ length + diameter*sex, data = dat) margins(fit) margins(fit, at.mean = TRUE) margins(fit, factor.continuous = TRUE) margins(fit, dydx = ~ Vars(model) + Terms()) fit <- madlib.glm(rings < 10 ~ length + diameter*sex, data = dat, family = "logistic") margins(fit, ~ length + sex) margins(fit, ~ length + sex.M, at.mean = TRUE) margins(fit, ~ length + sex.I, factor.continuous = TRUE) margins(fit, ~ Vars(model) + Terms()) ## create a data table that has two columns ## one of them is an array column dat1 <- cbind(db.array(dat[,-c(1,2,10)]), dat[,10]) names(dat1) <- c("x", "y") delete("abalone_array", conn.id = cid) dat1 <- as.db.data.frame(dat1, "abalone_array") fit <- madlib.glm(y < 10 ~ x[-1], data = dat1, family = "logistic") margins(fit, ~ x[2:5]) db.disconnect(cid, verbose = FALSE) ## End(Not run) ```