Description Usage Arguments Details Value Note Note Author(s) References See Also Examples
This function wraps MAdlib's elastic net regularization for generalized linear models. Currently linear and logistic regressions are supported.
1 2 3 4 |
formula |
A formula (or one that can be coerced to that class), specifies the dependent and independent variables. |
data |
A |
family |
A string which indicates which form of regression to apply. Default value is "gaussian". The accepted values are: "gaussian" or "linear": Linear regression; "binomial" or "logistic": Logistic regression. The support for other families will be added in the future. |
na.action |
A string which indicates what should happen when the data
contain |
na.as.level |
A logical value, default is |
alpha |
A numeric value in [0,1], elastic net mixing parameter. The penalty is defined as (1-alpha)/2||beta||_2^2+alpha||beta||_1. 'alpha=1' is the lasso penalty, and 'alpha=0' the ridge penalty. |
lambda |
A positive numeric value, the regularization parameter. |
standardize |
A logical, default: |
method |
A string, default: "fista". Name of optimizer, "fista", "igd"/"sgd" or "cd". "fista" means the fast iterative shrinkage-thresholding algorithm [1], and "sgd" implements the stochastic gradient descent algorithm [2]. "cd" implements the coordinate descent algorithm [5]. |
control |
A list, which contains the control parameters for the optimizers. (1) If - - - - - (2) If - - - - (3) The common control parameters for both "fista" and "sgd" optimizers: - - - - - - (4) The control parameters for "cd" optimizer include
All parameters have been explained above. The only one left is
|
glmnet |
A logical value, default is |
... |
More arguments, currently not implemented. |
the objective function for
"gaussian"
is
1/2 RSS/nobs + lambda*penalty,
and for the other models it is
-loglik/nobs + lambda*penalty.
An object of elnet.madlib
class, which is actually a list that contains the following items:
coef |
A vector, the fitting coefficients. |
intercept |
A numeric value, the intercept. |
y.scl |
A numeric value, which is used to scale the dependent values. In the "gaussian" case, it is 1 if |
loglik |
A numeric value, the log-likelihood of the fitting result. |
standardize |
The |
iter |
An integer, the itertion number used. |
ind.str |
A string. The independent variables in an array format string. |
terms |
A |
model |
A |
call |
A language object. The function call that generates this result. |
alpha |
The |
lambda |
The |
method |
The |
family |
The |
appear |
An array of strings, the same length as the number of independent
variables. The strings are used to print a clean result, especially when
we are dealing with the factor variables, where the dummy variable
names can be very long due to the inserting of a random string to
avoid naming conflicts, see |
max.iter, tolerance |
The |
The coordinate descent (method = "cd"
) algorithm is currently only available in PivotalR. In the future, we will also implement it in MADlib. The idea is to do some part of the computation in memory. Due to the memory usage liimitation of the database, this method cannot handle the fitting where the number of features is too large (a couple of thousands).
It is strongly recommended that you run this function on a subset of the data with a limited max_iter before applying it to the full data set with a large max.iter if the data set is big. In the pre-run, you can adjust the parameters to get the best performance and then apply the best set of parameters to the whole data set.
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. fmcquillan@pivotal.io
[1] Beck, A. and M. Teboulle (2009), A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. on Imaging Sciences 2(1), 183-202.
[2] Shai Shalev-Shwartz and Ambuj Tewari, Stochastic Methods for l1 Regularized Loss Minimization. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009.
[3] Elastic net regularization. https://en.wikipedia.org/wiki/Elastic_net_regularization
[4] Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, Chap 13.4, 2012.
[5] Jerome Friedman, Trevor Hastie and Rob Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, Vol. 33(1), 2010.
generic.cv
does k-fold cross-validation. See the examples there about how to use elastic net together with cross-validation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)
x <- matrix(rnorm(100*20),100,20)
y <- rnorm(100, 0.1, 2)
dat <- data.frame(x, y)
delete("eldata")
z <- as.db.data.frame(dat, "eldata", conn.id = cid, verbose = FALSE)
fit <- madlib.elnet(y ~ ., data = z, alpha = 0.2, lambda = 0.05, control
= list(random.stepsize=TRUE))
fit
lk(mean((z$y - predict(fit, z))^2)) # mean square error
fit <- madlib.elnet(y ~ ., data = z, alpha = 0.2, lambda = 0.05, method = "cd")
fit
db.disconnect(cid, verbose = FALSE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.