Description Usage Arguments Details Value Note Note Author(s) References See Also Examples
This function wraps MAdlib's elastic net regularization for generalized linear models. Currently linear and logistic regressions are supported.
1 2 3 4 
formula 
A formula (or one that can be coerced to that class), specifies the dependent and independent variables. 
data 
A 
family 
A string which indicates which form of regression to apply. Default value is "gaussian". The accepted values are: "gaussian" or "linear": Linear regression; "binomial" or "logistic": Logistic regression. The support for other families will be added in the future. 
na.action 
A string which indicates what should happen when the data
contain 
na.as.level 
A logical value, default is 
alpha 
A numeric value in [0,1], elastic net mixing parameter. The penalty is defined as (1alpha)/2beta_2^2+alphabeta_1. 'alpha=1' is the lasso penalty, and 'alpha=0' the ridge penalty. 
lambda 
A positive numeric value, the regularization parameter. 
standardize 
A logical, default: 
method 
A string, default: "fista". Name of optimizer, "fista", "igd"/"sgd" or "cd". "fista" means the fast iterative shrinkagethresholding algorithm [1], and "sgd" implements the stochastic gradient descent algorithm [2]. "cd" implements the coordinate descent algorithm [5]. 
control 
A list, which contains the control parameters for the optimizers. (1) If      (2) If     (3) The common control parameters for both "fista" and "sgd" optimizers:       (4) The control parameters for "cd" optimizer include
All parameters have been explained above. The only one left is

glmnet 
A logical value, default is 
... 
More arguments, currently not implemented. 
the objective function for
"gaussian"
is
1/2 RSS/nobs + lambda*penalty,
and for the other models it is
loglik/nobs + lambda*penalty.
An object of elnet.madlib
class, which is actually a list that contains the following items:
coef 
A vector, the fitting coefficients. 
intercept 
A numeric value, the intercept. 
y.scl 
A numeric value, which is used to scale the dependent values. In the "gaussian" case, it is 1 if 
loglik 
A numeric value, the loglikelihood of the fitting result. 
standardize 
The 
iter 
An integer, the itertion number used. 
ind.str 
A string. The independent variables in an array format string. 
terms 
A 
model 
A 
call 
A language object. The function call that generates this result. 
alpha 
The 
lambda 
The 
method 
The 
family 
The 
appear 
An array of strings, the same length as the number of independent
variables. The strings are used to print a clean result, especially when
we are dealing with the factor variables, where the dummy variable
names can be very long due to the inserting of a random string to
avoid naming conflicts, see 
max.iter, tolerance 
The 
The coordinate descent (method = "cd"
) algorithm is currently only available in PivotalR. In the future, we will also implement it in MADlib. The idea is to do some part of the computation in memory. Due to the memory usage liimitation of the database, this method cannot handle the fitting where the number of features is too large (a couple of thousands).
It is strongly recommended that you run this function on a subset of the data with a limited max_iter before applying it to the full data set with a large max.iter if the data set is big. In the prerun, you can adjust the parameters to get the best performance and then apply the best set of parameters to the whole data set.
Author: Predictive Analytics Team at Pivotal Inc.
Maintainer: Frank McQuillan, Pivotal Inc. [email protected]
[1] Beck, A. and M. Teboulle (2009), A fast iterative shrinkagethresholding algorithm for linear inverse problems. SIAM J. on Imaging Sciences 2(1), 183202.
[2] Shai ShalevShwartz and Ambuj Tewari, Stochastic Methods for l1 Regularized Loss Minimization. Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009.
[3] Elastic net regularization. http://en.wikipedia.org/wiki/Elastic_net_regularization
[4] Kevin P. Murphy, Machine Learning: A Probabilistic Perspective, The MIT Press, Chap 13.4, 2012.
[5] Jerome Friedman, Trevor Hastie and Rob Tibshirani, Regularization Paths for Generalized Linear Models via Coordinate Descent, Journal of Statistical Software, Vol. 33(1), 2010.
generic.cv
does kfold crossvalidation. See the examples there about how to use elastic net together with crossvalidation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  ## Not run:
## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid < db.connect(port = .port, dbname = .dbname, verbose = FALSE)
x < matrix(rnorm(100*20),100,20)
y < rnorm(100, 0.1, 2)
dat < data.frame(x, y)
delete("eldata")
z < as.db.data.frame(dat, "eldata", conn.id = cid, verbose = FALSE)
fit < madlib.elnet(y ~ ., data = z, alpha = 0.2, lambda = 0.05, control
= list(random.stepsize=TRUE))
fit
lk(mean((z$y  predict(fit, z))^2)) # mean square error
fit < madlib.elnet(y ~ ., data = z, alpha = 0.2, lambda = 0.05, method = "cd")
fit
db.disconnect(cid, verbose = FALSE)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.