generateBlockDiagonalMatrices: Generate block diagonal matrices to allow for fused L2...
In fuser: Fused Lasso for High-Dimensional Regression over Groups

Description Usage Arguments Details Value Examples

Generate block diagonal matrices to allow for fused L2 optimization with glmnet.

1 2	generateBlockDiagonalMatrices(X, Y, groups, G, intercept = FALSE, penalty.factors = rep(1, dim(X)[2]), scaling = TRUE)

`X`	covariates matrix (n by p).
`Y`	response vector (length n).
`groups`	vector of group indicators (ideally factors, length n)
`G`	matrix representing the fusion strengths between pairs of groups (K by K). Zero entries are assumed to be independent pairs.
`intercept`	whether to include an (per-group) intercept in the model
`penalty.factors`	vector of weights for the penalization of each covariate (length p)
`scaling`	Whether to scale each subgroup by its size. See Details for an explanation.

We use the glmnet package to perform fused subgroup regression. In order to achieve this, we need to reformulate the problem as Y' = X'beta', where Y' is a concatenation of the responses Y and a vector of zeros, X' is a a matrix consisting of the block-diagonal matrix n by pK matrix X, where each block contains the covariates for one subgroups, and the choose(K,2)*p by pK matrix encoding the fusion penalties between pairs of groups. The vector of parameters beta' of length pK can be rearranged as a p by K matrix giving the parameters for each subgroup. The lasso penalty on the parameters is handled by glmnet.

One weakness of the approach described above is that larger subgroups will have a larger influence on the global parameters lambda and gamma. In order to mitigate this, we introduce the scaling parameter. If scaling=TRUE, then we scale the responses and covariates for each subgroup by the number of samples in that group.

A list with components X, Y, X.fused and penalty, where X is a n by pK block-diagonal bigmatrix, Y is a re-arranged bigvector of length n, and X.fused is a choose(K,2)*p by pK bigmatrix encoding the fusion penalties.

set.seed(123)

# Generate simple heterogeneous dataset
k = 4 # number of groups
p = 100 # number of covariates
n.group = 15 # number of samples per group
sigma = 0.05 # observation noise sd
groups = rep(1:k, each=n.group) # group indicators
# sparse linear coefficients
beta = matrix(0, p, k)
nonzero.ind = rbinom(p*k, 1, 0.025/k) # Independent coefficients
nonzero.shared = rbinom(p, 1, 0.025) # shared coefficients
beta[which(nonzero.ind==1)] = rnorm(sum(nonzero.ind), 1, 0.25)
beta[which(nonzero.shared==1),] = rnorm(sum(nonzero.shared), -1, 0.25)

X = lapply(1:k,
           function(k.i) matrix(rnorm(n.group*p),
                                n.group, p)) # covariates
y = sapply(1:k,
           function(k.i) X[[k.i]] %*% beta[,k.i] +
                           rnorm(n.group, 0, sigma)) # response
X = do.call('rbind', X)

# Pairwise Fusion strength hyperparameters (tau(k,k'))
# Same for all pairs in this example
G = matrix(1, k, k)

# Generate block diagonal matrices
transformed.data = generateBlockDiagonalMatrices(X, y, groups, G)