Fit a GLM with a combination of sparse and smooth regularization

Share:

Description

Fit a generalized linear model at grids of tuning parameter via penalized maximum likelihood. The regularization path is computed for a combination of sparse and smooth penalty at two grids of values for the regularization parameter lambda1(Lasso or MCP penalty) and lambda2(Laplacian penalty). Fits linear, logistic regression models.

Usage

1
2
3
4
glmgraph(X, Y, L, family=c("gaussian","binomial"), penalty=c("MCP","lasso") ,
mcpapproach=c("mmcd", "adaptive", "original"),gamma=8,
lambda1,nlambda1=100,lambda2=c(0, 0.01 * 2^(0:7)),eps=1e-3,max.iter=2000,
dfmax=round(ncol(X)/2),penalty.factor=rep(1,ncol(X)),standardize=TRUE,warn=FALSE,...)

Arguments

X

Input matrix; each row is an observation vector.

Y

Response vector. Quantitative for family="gaussian" or binary(0/1) for family="binomial".

family

Either "gaussian", "binomial", depending on the response.

L

User-specified Laplacian matrix.

penalty

The sparse penalty to be applied to the model. Either "MCP" (the default), or "lasso".

mcpapproach

For family="binomial", three optional algorithms are provided when penalty is set to MCP: "mmcd"(Majorization minimization by coordinate descent); "adaptive"(Adaptive rescaling) and "original"(without any adjustment). For family="gaussian", the option could only be "original".

gamma

The tuning parameter of the MCP penalty.The default value is 8.

nlambda1

The number of lambda1 values. Default is 100.

lambda1

A user-specified sequence of lambda1 values. Typical usage is to have the program compute its own lambda1 sequence based on nlambda1 and lambda1.min.ratio. Supplying a value of lambda1 overrides this. By default, a sequence of values of length nlambda1 is computed, equally spaced on the log scale.

lambda2

A user-specified sequence of lambda2 values. The default value are 0 and 0.01*2^(0:7). The selection of lambda2 depends on the data and should be adapted in some cases. A good suggestion is to try a few lambda2 and plot the results.

eps

Convergence threshold for coordinate descent. Each inner coordinate-descent loop continues until the relative change in the objective function is less than eps1. Default is 1e-3.

max.iter

Maximum number of passes over the data for all lambda1 values. Default is 2000.

dfmax

Limit the maximum number of variables in the model. Useful for very large p. Default value equals to half of p.

penalty.factor

A multiplicative factor for the penalty applied to each coefficient. If supplied, penalty.factor must be a numeric vector of length equal to the number of columns of X. The purpose of penalty.factor is to apply differential penalization if some coefficients are thought to be more likely than others to be in the model. In particular, penalty.factor can be 0, in which case the coefficient is always in the model without shrinkage.

standardize

Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is standardize=TRUE. If variables are in the same units already, you might not wish to standardize.

warn

Return warning messages for failures to converge and model selection issues. Default is FALSE.

...

Other parameters to glmgraph

Value

An object "glmgraph" containing:

betas

A list of fitted coefficients. The number of rows for each matrix is equal to the number of coefficients, and the number of columns is smaller or equal to nlambda1.

lambda1s

A list of vector. Each vector is a sequence of used lambda1 for each used lambda2.

lambda2

A sequence of lambda2 actually used.

loglik

A list of log likelihood for each value of lambda1 and lambda2.

df

A list of the number of nonzero values for each value of lambda1 and lambda2.

Author(s)

Li Chen <li.chen@emory.edu>, Jun Chen <chen.jun2@mayo.edu>

References

Li Chen. Han Liu. Hongzhe Li. Jun Chen. (2015) Graph-constrained Regularization for Sparse Generalized Linear Models.(Working paper)

See Also

plot.glmgraph, cv.glmgraph

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
	
 set.seed(1234)
 library(glmgraph)
 n <- 100
 p1 <- 10
 p2 <- 90
 p <- p1+p2
 X <- matrix(rnorm(n*p), n,p)
 magnitude <- 1
 A <- matrix(rep(0,p*p),p,p)
 A[1:p1,1:p1] <- 1
 A[(p1+1):p,(p1+1):p] <- 1
 diag(A) <- 0
 btrue <- c(rep(magnitude,p1),rep(0,p2))
 intercept <- 0
 eta <- intercept+X%*%btrue
 ### construct laplacian matrix from adjacency matrix
 diagL <- apply(A,1,sum)
 L <- -A
 diag(L) <- diagL
 ### gaussian
 Y <- eta+rnorm(n)
 obj <- glmgraph(X,Y,L,family="gaussian")
 plot(obj)
 ### binomial
 Y <- rbinom(n,1,prob=1/(1+exp(-eta)))
 obj <- glmgraph(X,Y,L,family="binomial")
 plot(obj)