grpregOverlap: Fit penalized regression models with overlapping grouped...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/grpregOverlap.R

Description

Fit the regularization paths of linear, logistic, Poisson or Cox models with overlapping grouped covariates based on the latent group lasso approach (Jacob et al., 2009; Obozinski et al., 2011). Latent group MCP/SCAD as well as bi-level selection methods, namely the group exponential lasso (Breheny, 2015) and the composite MCP (Huang et al., 2012) are also available.

This function is a useful wrapper to the grpreg package's grpreg and grpsurv (depending on the family) functions. Arguments can be passed through to these functions using ..., see grpreg and grpsurv for usage and more details.

Usage

1
2
3
grpregOverlap(X, y, group, 
    family=c("gaussian","binomial", "poisson", "cox"),
    returnX.latent = FALSE, returnOverlap = FALSE, ...)

Arguments

X

The design matrix, without an intercept. grpregOverlap calls grpreg, which standardizes the data and includes an intercept by default.

y

The response vector, or a matrix in the case of multitask learning. For survival analysis, y is the time-to-event outcome - a two-column matrix or Surv object. The first column is the time on study (follow up time); the second column is a binary variable with 1 indicating that the event has occurred and 0 indicating (right) censoring. See grpreg and grpsurv for more details.

group

Different from that in grpreg, group here must be a list of vectors, each containing integer indices or character names of variables in the group. variables that not belong to any groups will be disgarded.

family

Either "gaussian", "binomial", or 'cox', depending on the response. If family is missing, it is set to be 'gaussian'. Specify family = 'cox' for survival analysis (Cox models).

returnX.latent

Return the new expanded design matrix? Default is FALSE. Note the storage size of this new matrix can be very large. Note: the name of this argument was recently changed so that returnX can be passed through to grpreg (in which case it will return the group-orthonormalized design.

returnOverlap

Return the matrix containing overlapps? Default is FALSE. It is a square matrix C such that C[i, j] is the number of overlapped variables between group i and j. Diagonal value C[i, i] is therefore the number of variables in group i.

...

Used to pass options (e.g., 'group.multiplier') to grpreg. Note: the returnX argument will not be passed through, since this will cause grpregOverlap to store X.latent in the fitted model object.

Details

The latent group lasso approach extends the group lasso to group variable selection with overlaps. The proposed latent group lasso penalty is formulated in a way such that it's equivalent to a classical non-overlapping group lasso problem in an new space, which is expanded by duplicating the columns of overlapped variables. For technical details, see (Jacob et al., 2009) and (Obozinski et al., 2011).

grpregOverlap takes input design matrix X and grouping information group, and expands X to the new, non-overlapping space. It then calls grpreg for modeling fitting based on group decent algorithm. Unlike in grpreg, the interface for group bridge-penalized method is not implemented.

The expanded design matrix is named X.latent. It is a returned value in the fitted object, provided returnX.latent is TRUE. The latent coeffecient (or norm) vector then corresponds to that. Note thaT when constructing X.latent, the columns in X corresponding to those variables not included in group will be removed automatically.

For more detailed explanation for the penalties and algorithm, see grpreg.

Value

An object with S3 class "grpregOverlap" or "grpsurvOverlap" (for Cox models), which inherits "grpreg", with following variables.

beta

The fitted matrix of coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.

family

Same as above.

group

Same as above.

lambda

The sequence of lambda values in the path.

alpha

Same as above.

loss

A vector containing either the residual sum of squares ("gaussian") or negative log-likelihood ("binomial") or negative partial log-likelihood ("cox") of the fitted model at each value of lambda.

n

Number of observations.

penalty

Same as above.

df

A vector of length nlambda containing estimates of effective number of model parameters all the points along the regularization path. For details on how this is calculated, see Breheny and Huang (2009).

iter

A vector of length nlambda containing the number of iterations until convergence at each value of lambda.

group.multiplier

A named vector containing the multiplicative constant applied to each group's penalty.

beta.latent

The fitted matrix of latent coefficients. The number of rows is equal to the number of coefficients, and the number of columns is equal to nlambda.

incidence.mat

Incidence matrix: I[i, j] = 1 if group i contains variable j; otherwise 0.

grp.vec

A vector of consecutive integers indicating grouping information of variables. This is equivalent to argument group in grpreg.

overlap.mat

A square matrix C where C[i, j] is the number of overlapped variables between group i and j. Diagonal value C[i, i] is therefore the number of variables in group i. Only returned if returnOverlap is TRUE.

X.latent

The new expanded design matrix for the latent group lasso formulation. The variables are reordered according to the order of groups. Only returned if returnX.latent is TRUE.

W

Matrix of exp(beta) values for each subject over all lambda values. (For Cox models only)

time

Times on study. (For Cox models only)

fail

Failure event indicator. (For Cox models only)

Author(s)

Yaohui Zeng and Patrick Breheny

Maintainer: Yaohui Zeng <yaohui-zeng@uiowa.edu>

References

See Also

cv.grpregOverlap, cv.grpsurvOverlap, plot, select, grpreg, grpsurv.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
## linear regression, a simulation demo.
set.seed(123)
group <- list(gr1 = c(1, 2, 3), gr2 = c(1, 4), gr3 = c(2, 4, 5), 
              gr4 = c(3, 5), gr5 = c(6))
beta.latent.T <- c(5, 5, 5, 0, 0, 0, 0, 0, 5, 5, 0) # true latent coefficients.
# beta.T <- c(5, 5, 10, 0, 5, 0), true variables: 1, 2, 3, 5; true groups: 1, 4.
X <- matrix(rnorm(n = 6*100), ncol = 6)  
X.latent <- expandX(X, group)
y <- X.latent %*% beta.latent.T + rnorm(100)

fit <- grpregOverlap(X, y, group, penalty = 'grLasso')
# fit <- grpregOverlap(X, y, group, penalty = 'grMCP')
# fit <- grpregOverlap(X, y, group, penalty = 'grSCAD')
head(coef(fit, latent = TRUE)) # compare to beta.latent.T
plot(fit, latent = TRUE) 
head(coef(fit, latent = FALSE)) # compare to beta.T
plot(fit, latent = FALSE)

cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grMCP')
plot(cvfit)
head(coef(cvfit))
summary(cvfit)

## logistic regression, real data, pathway selection
data(pathway.dat)
X <- pathway.dat$expression
group <- pathway.dat$pathways
y <- pathway.dat$mutation
fit <- grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
plot(fit)
str(select(fit))
str(select(fit,criterion="AIC",df="active"))

## Not run: 
cvfit <- cv.grpregOverlap(X, y, group, penalty = 'grLasso', family = 'binomial')
coef(cvfit)
predict(cvfit, X, type='response')
predict(cvfit, X, type = 'class')
plot(cvfit)
plot(cvfit, type = 'all')
summary(cvfit)

## End(Not run)

YaohuiZeng/grpregOverlap documentation built on Aug. 10, 2020, 3:13 p.m.