margins: Compute the marginal effects of regression models
In PivotalR: A Fast, Easy-to-Use Tool for Manipulating Tables in Databases and a Wrapper of MADlib

Description Usage Arguments Details Value Author(s) References See Also Examples

margins calculates the marginal effects of the variables given the result of regressions (madlib.lm, madlib.glm etc). Vars lists all the variables used in the regression model. Terms lists the specified terms in the original model. Vars and Terms are only used in margins's dydx option.

## S3 method for class 'lm.madlib'
margins(model, dydx = ~Vars(model), newdata =
model$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'lm.madlib.grps'
margins(model, dydx = ~Vars(model), newdata =
lapply(model, function(x) x$data), at.mean = FALSE, factor.continuous =
FALSE, na.action = NULL, ...)

## S3 method for class 'logregr.madlib'
margins(model, dydx = ~Vars(model), newdata =
model$data, at.mean = FALSE, factor.continuous = FALSE, na.action =
NULL, ...)

## S3 method for class 'logregr.madlib.grps'
margins(model, dydx = ~Vars(model),
newdata = lapply(model, function(x) x$data), at.mean = FALSE,
factor.continuous = FALSE, na.action = NULL, ...)

## S3 method for class 'margins'
print(x, digits = max(3L, getOption("digits") - 3L),
...)

Vars(model)

Terms(term = NULL)

`model`	The result of `madlib.lm`, `madlib.glm`, which represents a regression model for the training data.
`dydx`	A formula, and the default is `~ Vars(model)`, which tells the function to compute the marginal effects for all the variables that appear in the model. `~ .` will compute the marginal effects of all variables in `newdata`. Use the normal formula to specify which variables' marginal effects are to be computed.
`newdata`	A `db.obj` object, which represents the data in the database. The default is the data used to train the regression model, but the user can freely use other data sets.
`at.mean`	A logical, the default is `FALSE`. Whether to compute the marginal effects at the mean values of the variables.
`factor.continuous`	A logical, the default is `FALSE`. Whether to compute the marginal effects of factors by treating them as continuous variables. See "details" for more explanation.
`na.action`	A string which indicates what should happen when the data contain `NA`s. Possible values include `na.omit`, `"na.exclude"`, `"na.fail"` and `NULL`. Right now, `na.omit,db.obj-method` has been implemented. When the value is `NULL`, nothing is done on the R side and `NA` values are filtered out and omitted on the MADlib side. User defined `na.action` function is allowed, and see `na.omit,db.obj-method` for the preferred function interface.
`...`	Other arguments, not implemented.
`x`	The result of `margins` function, which is of the class "margins".
`digits`	A non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The default, ‘NULL’, uses ‘getOption("digits")’. (For the interpretation for complex numbers see `signif`.) Non-integer values will be rounded down, and only values greater than or equal to 1 and no greater than 22 are accepted.
`term`	A vector of integers, the default is `NULL`. When `term=i`, compute the marginal effects of the i-th term. Even if this term contains multiple variables, we treat it as a variable independent of all others. When `term=NULL`, the marginal effects of all terms are calculated. In the final result, margianl effect results for `".term.1"`, `".term.2"` etc will be shown. By comparing with `names(model$coef)`, one can easily figure out which term corresponds to which expression. `(Intercept)` term's marginal effect cannot be computed using this (One can create an extra column that equals 1 and use it as a variable without using intercept by add -1 into the fitting formula).

For a continuous variable, its marginal effects is just the first derivative of the response function with respect to the variable. For a categorical variable, it is usually more meaningful to compute the finite difference of the response function for the variable being 1 and 0. The finite difference marginal effect measures how much more the response function would be compared with the reference category. The reference category for a categorical variable can be changed by relevel.

margins function returns a margins object, which is a data.frame. It contains the following item:

`Estimate`	The marginal effect values for all variable that have been specified in `dydx`.
`Std. Error`	The standard errors for the marginal effects.
`t value, z value`	The t statistics (for linear regression) or z statistics (for logistic regression).
`Pr(>\|t\|), Pr(>\|z\|)`	The corresponding p values.

Vars returns a vector of strings, which are the variable names that have been used in the regression model.

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io

[1] Stata 13 help for margins, https://www.stata.com/help.cgi?margins

relevel changes the reference category.

madlib.lm, madlib.glm compute linear and logistic regressions.

## Not run: 


## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname)

## create a data table in database and the R wrapper
delete("abalone", conn.id = cid)
dat <- as.db.data.frame(abalone, "abalone", conn.id = cid)

fit <- madlib.lm(rings ~ length + diameter*sex, data = dat)
margins(fit)
margins(fit, at.mean = TRUE)
margins(fit, factor.continuous = TRUE)
margins(fit, dydx = ~ Vars(model) + Terms())

fit <- madlib.glm(rings < 10 ~ length + diameter*sex, data = dat, family = "logistic")
margins(fit, ~ length + sex)
margins(fit, ~ length + sex.M, at.mean = TRUE)
margins(fit, ~ length + sex.I, factor.continuous = TRUE)
margins(fit, ~ Vars(model) + Terms())

## create a data table that has two columns
## one of them is an array column
dat1 <- cbind(db.array(dat[,-c(1,2,10)]), dat[,10])
names(dat1) <- c("x", "y")
delete("abalone_array", conn.id = cid)
dat1 <- as.db.data.frame(dat1, "abalone_array")

fit <- madlib.glm(y < 10 ~ x[-1], data = dat1, family = "logistic")
margins(fit, ~ x[2:5])

db.disconnect(cid, verbose = FALSE)

## End(Not run)