Description Usage Arguments Details Value Author(s) References See Also Examples
Fit the regularization paths of linear, logistic, Poisson or Cox models with overlapping grouped covariates based on the latent group lasso approach (Jacob et al., 2009; Obozinski et al., 2011). Latent group MCP/SCAD as well as bi-level selection methods, namely the group exponential lasso (Breheny, 2015) and the composite MCP (Huang et al., 2012) are also available.
1 2 3 4 5 6 7 8 9 | grpregOverlap(X, y, group,
penalty=c("grLasso", "grMCP", "grSCAD", "gel", "cMCP", "gLasso", "gMCP"),
family=c("gaussian","binomial", "poisson", "cox"), nlambda=100, lambda,
lambda.min={if (nrow(X) > ncol(X)) 1e-4 else .05}, alpha=1, eps=.001,
max.iter=1000, dfmax=ncol(X), gmax=length(group),
gamma=ifelse(penalty == "grSCAD", 4, 3), tau=1/3,
group.multiplier,
returnX = FALSE, returnOverlap = FALSE,
warn=TRUE, ...)
|
X |
The design matrix, without an intercept. |
y |
The response vector, or a matrix in the case of multitask learning. For survival analysis, |
group |
Different from that in |
penalty |
The penalty to be applied to the model. Specify |
family |
Either "gaussian", "binomial", or 'cox', depending on the response. If
|
nlambda |
The number of |
lambda |
A user supplied sequence of |
lambda.min |
The smallest value for |
alpha |
Adopted from |
eps |
Convergence threshhold. The algorithm iterates until the change (on
the standardized scale) in any coefficient is less than |
max.iter |
The maximum number of iterations. Default is 1000. See |
dfmax |
Limit on the number of parameters allowed to be nonzero. If this limit is exceeded, the algorithm will exit early from the regularization path. Default is the total number of covariates. |
gmax |
Limit on the number of groups allowed to have nonzero elements. If this limit is exceeded, the algorithm will exit early from the regularization path. Default is the total number of groups. |
gamma |
Tuning parameter of the MCP penalty; defaults to 3. |
tau |
Tuning parameter for the group exponential lasso; defaults to 1/3. |
group.multiplier |
A vector of values representing multiplicative factors by which each group's penalty is to be multiplied. Often, this is a function (such as the square root) of the number of predictors in each group. If this is not specified by the user, the internal code will, by default, use the square root of group size for the group selection methods, and a vector of 1's (i.e., no adjustment for group size) for bi-level selection. |
returnX |
Return the new expanded design matrix? Default is FALSE. Note the storage size of this new matrix can be very large. |
returnOverlap |
Return the matrix containing overlapps? Default is FALSE. It is a square matrix C such that C[i, j] is the number of overlapped variables between group i and j. Diagonal value C[i, i] is therefore the number of variables in group i. |
warn |
Should the function give a warning if it fails to converge? Default is TRUE.
See |
... |
Not used currently. |
The latent group lasso approach extends the group lasso to group variable selection with overlaps. The proposed latent group lasso penalty is formulated in a way such that it's equivalent to a classical non-overlapping group lasso problem in an new space, which is expanded by duplicating the columns of overlapped variables. For technical details, see (Jacob et al., 2009) and (Obozinski et al., 2011).
grpregOverlap
takes input design matrix X
and grouping information
group
, and expands X to the new, non-overlapping space. It then calls
grpreg
for modeling fitting based on group decent algorithm. Unlike
in grpreg
, the interface for group bridge-penalized method is not implemented.
The expanded design matrix is named X.latent
. It is a returned value in the fitted
object, provided returnX
is TRUE. The latent coeffecient (or norm) vector then
corresponds to that. Note thaT when constructing X.latent
, the columns in X
corresponding to those variables not included in group
will be removed automatically.
For more detailed explanation for the penalties and algorithm, see grpreg
.
An object with S3 class "grpregOverlap"
or "grpsurvOverlap"
(for Cox models), which inherits "grpreg"
,
with following variables.
beta |
The fitted matrix of coefficients. The number of rows is equal to the number
of coefficients, and the number of columns is equal to |
family |
Same as above. |
group |
Same as above. |
lambda |
The sequence of |
alpha |
Same as above. |
loss |
A vector containing either the residual sum of squares ( |
n |
Number of observations. |
penalty |
Same as above. |
df |
A vector of length |
iter |
A vector of length |
group.multiplier |
A named vector containing the multiplicative constant applied to each group's penalty. |
beta.latent |
The fitted matrix of latent coefficients. The number of rows is equal to the number
of coefficients, and the number of columns is equal to |
incidence.mat |
Incidence matrix: I[i, j] = 1 if group i contains variable j; otherwise 0. |
grp.vec |
A vector of consecutive integers indicating grouping information of variables. This
is equivalent to argument |
overlap.mat |
A square matrix C where C[i, j] is the number of overlapped
variables between group i and j. Diagonal value C[i, i] is therefore the
number of variables in group i. Only returned if |
X.latent |
The new expanded design matrix for the latent group lasso formulation. The
variables are reordered according to the order of groups. Only returned if
|
W |
Matrix of |
time |
Times on study. (For Cox models only) |
fail |
Failure event indicator. (For Cox models only) |
Yaohui Zeng and Patrick Breheny
Maintainer: Yaohui Zeng <yaohui-zeng@uiowa.edu>
Zeng, Y., and Breheny, P. (2016). Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection. Cancer Informatics, 15, 179-187. http://doi.org/10.4137/CIN.S40043.
Jacob, L., Obozinski, G., and Vert, J. P. (2009, June). Group lasso with overlap and graph lasso. In Proceedings of the 26th annual international conference on machine learning, ACM: 433-440. http://www.machinelearning.org/archive/icml2009/papers/471.pdf
Obozinski, G., Jacob, L., and Vert, J. P. (2011). Group lasso with overlaps: the latent group lasso approach. http://arxiv.org/abs/1110.0413.
Breheny, P. and Huang, J. (2009) Penalized methods for bi-level variable selection. Statistics and its interface, 2: 369-380. http://myweb.uiowa.edu/pbreheny/publications/Breheny2009.pdf
Huang J., Breheny, P. and Ma, S. (2012). A selective review of group selection in high dimensional models. Statistical Science, 27: 481-499. http://myweb.uiowa.edu/pbreheny/publications/Huang2012.pdf
Breheny P and Huang J (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173-187.http://myweb.uiowa.edu/pbreheny/publications/group-computing.pdf
Breheny P and Huang J (2009). Penalized methods for bi-level variable selection. Statistics and Its Interface, 2: 369-380. http://myweb.uiowa.edu/pbreheny/publications/Breheny2009.pdf
Breheny P (2014). R package 'grpreg'. https://CRAN.R-project.org/package=grpreg/grpreg.pdf
cv.grpregOverlap
, cv.grpsurvOverlap
, plot
,
select
, grpreg
, grpsurv
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | ## linear regression, a simulation demo.
set.seed(123)
group <- list(gr1 = c(1, 2, 3), gr2 = c(1, 4), gr3 = c(2, 4, 5),
gr4 = c(3, 5), gr5 = c(6))
beta.latent.T <- c(5, 5, 5, 0, 0, 0, 0, 0, 5, 5, 0) # true latent coefficients.
# beta.T <- c(5, 5, 10, 0, 5, 0), true variables: 1, 2, 3, 5; true groups: 1, 4.
X <- matrix(rnorm(n = 6*100), ncol = 6)
X.latent <- expandX(X, group)
y <- X.latent %*% beta.latent.T + rnorm(100)
fit <- grpregOverlap(X, y, group, penalty = 'grLasso')
# fit <- grpregOverlap(X, y, group, penalty = 'grMCP')
# fit <- grpregOverlap(X, y, group, penalty = 'grSCAD')
head(coef(fit, latent = TRUE)) # compare to beta.latent.T
plot(fit, latent = TRUE)
head(coef(fit, latent = FALSE)) # compare to beta.T
plot(fit, latent = FALSE)
cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grMCP')
plot(cvfit)
head(coef(cvfit))
summary(cvfit)
## logistic regression, real data, pathway selection
data(pathway.dat)
X <- pathway.dat$expression
group <- pathway.dat$pathways
y <- pathway.dat$mutation
fit <- grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
plot(fit)
str(select(fit))
str(select(fit,criterion="AIC",df="active"))
## Not run:
cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
coef(cvfit)
predict(cvfit, X, type='response')
predict(cvfit, X, type = 'class')
plot(cvfit)
plot(cvfit, type = 'all')
summary(cvfit)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.