Description Usage Arguments Details Value Author(s) References See Also Examples
Fit regularization paths for models with grouped penalties over a grid of values for the regularization parameter lambda. Fits linear and logistic regression models.
1 2 3 4 5 6  grpreg(X, y, group=1:ncol(X), penalty=c("grLasso", "grMCP", "grSCAD",
"gel", "cMCP"), family=c("gaussian", "binomial", "poisson"),
nlambda=100, lambda, lambda.min={if (nrow(X) > ncol(X)) 1e4 else .05},
log.lambda = TRUE, alpha=1, eps=1e4, max.iter=10000, dfmax=p,
gmax=length(unique(group)), gamma=ifelse(penalty == "grSCAD", 4, 3),
tau = 1/3, group.multiplier, warn=TRUE, returnX = FALSE, ...)

X 
The design matrix, without an intercept. 
y 
The response vector, or a matrix in the case of multitask learning (see details). 
group 
A vector describing the grouping of the coefficients.
For greatest efficiency and least ambiguity (see details), it is
best if 
penalty 
The penalty to be applied to the model. For group
selection, one of 
family 
Either "gaussian" or "binomial", depending on the response. 
nlambda 
The number of 
lambda 
A user supplied sequence of 
lambda.min 
The smallest value for 
log.lambda 
Whether compute the grid values of lambda on log scale (default) or linear scale. 
alpha 

eps 
Convergence threshhold. The algorithm iterates until the
RMSD for the change in linear predictors for each coefficient is
less than 
max.iter 
Maximum number of iterations (total across entire path). Default is 10000. See details. 
dfmax 
Limit on the number of parameters allowed to be nonzero. If this limit is exceeded, the algorithm will exit early from the regularization path. 
gmax 
Limit on the number of groups allowed to have nonzero elements. If this limit is exceeded, the algorithm will exit early from the regularization path. 
gamma 
Tuning parameter of the group or composite MCP/SCAD penalty (see details). Default is 3 for MCP and 4 for SCAD. 
tau 
Tuning parameter for the group exponential lasso; defaults to 1/3. 
group.multiplier 
A vector of values representing multiplicative factors by which each group's penalty is to be multiplied. Often, this is a function (such as the square root) of the number of predictors in each group. The default is to use the square root of group size for the group selection methods, and a vector of 1's (i.e., no adjustment for group size) for bilevel selection. 
warn 
Should the function give a warning if it fails to converge? Default is TRUE. See details. 
returnX 
Return the standardized design matrix (and associated group structure information)? Default is FALSE. 
... 
Arguments passed to other functions (such as gBridge). 
There are two general classes of methods involving grouped penalties:
those that carry out bilevel selection and those that carry out group
selection. Bilevel means carrying out variable selection at the
group level as well as the level of individual covariates (i.e.,
selecting important groups as well as important members of those
groups). Group selection selects important groups, and not members
within the group – i.e., within a group, coefficients will either all
be zero or all nonzero. The grLasso
, grMCP
, and
grSCAD
penalties carry out group selection, while the
gel
and cMCP
penalties carry out bilevel selection.
For bilevel selection, see also the gBridge
function.
For historical reasons and backwards compatibility, some of these
penalties have aliases; e.g., gLasso
will do the same thing as
grLasso
, but users are encouraged to use grLasso
.
Please note the distinction between grMCP
and cMCP
. The
former involves an MCP penalty being applied to an L2norm of each
group. The latter involves a hierarchical penalty which places an
outer MCP penalty on a sum of inner MCP penalties for each group, as
proposed in Breheny & Huang, 2009. Either penalty may be referred to
as the "group MCP", depending on the publication. To resolve this
confusion, Huang et al. (2012) proposed the name "composite MCP" for
the cMCP
penalty.
For more information about the penalties and their properties, please
consult the references below, many of which contain discussion, case
studies, and simulation studies comparing the methods. If you use
grpreg
for an analysis, please cite the appropriate reference.
In keeping with the notation from the original MCP paper, the tuning
parameter of the MCP penalty is denoted 'gamma'. Note, however, that
in Breheny and Huang (2009), gamma
is denoted 'a'.
The objective function for grpreg
optimization is defined to be
Q(βX, y) = (1/n)*L(βX, y) + P(β, λ),
where the loss function L is the deviance (2 times the log likelihood) for the specified outcome distribution (gaussian/binomial/poisson). See here for more details.
For the bilevel selection methods, a locally approximated coordinate descent algorithm is employed. For the group selection methods, group descent algorithms are employed.
The algorithms employed by grpreg
are stable and generally
converge quite rapidly to values close to the solution. However,
especially when p is large compared with n, grpreg
may fail to
converge at low values of lambda
, where models are
nonidentifiable or nearly singular. Often, this is not the region of
the coefficient path that is most interesting. The default behavior
warning the user when convergence criteria are not met may be
distracting in these cases, and can be modified with warn
(convergence can always be checked later by inspecting the value of
iter
).
If models are not converging, increasing max.iter
may not be
the most efficient way to correct this problem. Consider increasing
n.lambda
or lambda.min
in addition to increasing
max.iter
.
Although grpreg
allows groups to be unordered and given
arbitary names, it is recommended that you specify groups as
consecutive integers. The first reason is efficiency: if groups are
out of order, X
must be reordered prior to fitting, then this
process reversed to return coefficients according to the original
order of X
. This is inefficient if X
is very large.
The second reason is ambiguity with respect to other arguments such as
group.multiplier
. With consecutive integers, group=3
unambiguously denotes the third element of group.multiplier
.
Seemingly unrelated regressions/multitask learning can be carried out
using grpreg
by passing a matrix to y
. In this case,
X
will be used in separate regressions for each column of
y
, with the coefficients grouped across the responses. In
other words, each column of X
will form a group with m
members, where m is the number of columns of y
. For multiple
Gaussian responses, it is recommended to standardize the columns of
y
prior to fitting, in order to apply the penalization equally
across columns.
grpreg
requires groups to be nonoverlapping; for model fitting
involving overlapping groups, see the companion package
grpregOverlap
.
An object with S3 class "grpreg"
containing:
The fitted matrix of coefficients. The number of rows
is equal to the number of coefficients, and the number of columns
is equal to nlambda
.
Same as above.
Same as above.
The sequence of lambda
values in the path.
Same as above.
A vector containing either the residual sum of squares
("gaussian"
) or negative loglikelihood ("binomial"
)
of the fitted model at each value of lambda
.
Number of observations.
Same as above.
A vector of length nlambda
containing estimates of
effective number of model parameters all the points along the
regularization path. For details on how this is calculated, see
Breheny and Huang (2009).
A vector of length nlambda
containing the number
of iterations until convergence at each value of lambda
.
A named vector containing the multiplicative constant applied to each group's penalty.
Patrick Breheny
Breheny, P. and Huang, J. (2009) Penalized methods for bilevel variable selection. Statistics and its interface, 2: 369380. myweb.uiowa.edu/pbreheny/pdf/Breheny2009.pdf
Huang J., Breheny, P. and Ma, S. (2012). A selective review of group selection in high dimensional models. Statistical Science, 27: 481499. myweb.uiowa.edu/pbreheny/pdf/Huang2012.pdf
Breheny, P. and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173187. www.springerlink.com/openurl.asp?genre=article&id=doi:10.1007/s1122201394242
Breheny, P. (2015) The group exponential lasso for bilevel variable selection. Biometrics, 71: 731740. http://dx.doi.org/10.1111/biom.12300
cv.grpreg
, as well as
plot
and
select
methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55  # Birthweight data
data(Birthwt)
X < Birthwt$X
group < Birthwt$group
# Linear regression
y < Birthwt$bwt
fit < grpreg(X, y, group, penalty="grLasso")
plot(fit)
fit < grpreg(X, y, group, penalty="grMCP")
plot(fit)
fit < grpreg(X, y, group, penalty="grSCAD")
plot(fit)
fit < grpreg(X, y, group, penalty="gel")
plot(fit)
fit < grpreg(X, y, group, penalty="cMCP")
plot(fit)
select(fit, "AIC")
# Logistic regression
y < Birthwt$low
fit < grpreg(X, y, group, penalty="grLasso", family="binomial")
plot(fit)
fit < grpreg(X, y, group, penalty="grMCP", family="binomial")
plot(fit)
fit < grpreg(X, y, group, penalty="grSCAD", family="binomial")
plot(fit)
fit < grpreg(X, y, group, penalty="gel", family="binomial")
plot(fit)
fit < grpreg(X, y, group, penalty="cMCP", family="binomial")
plot(fit)
select(fit, "BIC")
# Multitask learning (simulated example)
set.seed(1)
n < 50
p < 10
k < 5
X < matrix(runif(n*p), n, p)
y < matrix(rnorm(n*k, X[,1] + X[,2]), n, k)
fit < grpreg(X, y)
# Note that group is set up automatically
fit$group
plot(fit)
# Overlapping groups
## Not run:
library(grpregOverlap)
data(pathway.dat)
X < pathway.dat$expression
group < pathway.dat$pathways
y < pathway.dat$mutation
fit < grpregOverlap(X, y, group, family='binomial')
plot(fit)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.