Description Usage Arguments Details Value Examples
Generate block diagonal matrices to allow for fused L2 optimization with glmnet.
1 2 | generateBlockDiagonalMatrices(X, Y, groups, G, intercept = FALSE,
penalty.factors = rep(1, dim(X)[2]), scaling = TRUE)
|
X |
covariates matrix (n by p). |
Y |
response vector (length n). |
groups |
vector of group indicators (ideally factors, length n) |
G |
matrix representing the fusion strengths between pairs of groups (K by K). Zero entries are assumed to be independent pairs. |
intercept |
whether to include an (per-group) intercept in the model |
penalty.factors |
vector of weights for the penalization of each covariate (length p) |
scaling |
Whether to scale each subgroup by its size. See Details for an explanation. |
We use the glmnet
package to perform fused subgroup regression.
In order to achieve this, we need to reformulate the problem as Y' = X'beta',
where Y' is a concatenation of the responses Y and a vector of zeros, X' is a
a matrix consisting of the block-diagonal matrix n by pK matrix X, where each
block contains the covariates for one subgroups, and the choose(K,2)*p by pK
matrix encoding the fusion penalties between pairs of groups. The vector of
parameters beta' of length pK can be rearranged as a p by K matrix giving the
parameters for each subgroup. The lasso penalty on the parameters is handled
by glmnet.
One weakness of the approach described above is that larger subgroups will
have a larger influence on the global parameters lambda and gamma.
In order to mitigate this, we introduce the scaling
parameter. If
scaling=TRUE
, then we scale the responses and covariates for each
subgroup by the number of samples in that group.
A list with components X, Y, X.fused and penalty, where X is a n by pK block-diagonal bigmatrix, Y is a re-arranged bigvector of length n, and X.fused is a choose(K,2)*p by pK bigmatrix encoding the fusion penalties.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | set.seed(123)
# Generate simple heterogeneous dataset
k = 4 # number of groups
p = 100 # number of covariates
n.group = 15 # number of samples per group
sigma = 0.05 # observation noise sd
groups = rep(1:k, each=n.group) # group indicators
# sparse linear coefficients
beta = matrix(0, p, k)
nonzero.ind = rbinom(p*k, 1, 0.025/k) # Independent coefficients
nonzero.shared = rbinom(p, 1, 0.025) # shared coefficients
beta[which(nonzero.ind==1)] = rnorm(sum(nonzero.ind), 1, 0.25)
beta[which(nonzero.shared==1),] = rnorm(sum(nonzero.shared), -1, 0.25)
X = lapply(1:k,
function(k.i) matrix(rnorm(n.group*p),
n.group, p)) # covariates
y = sapply(1:k,
function(k.i) X[[k.i]] %*% beta[,k.i] +
rnorm(n.group, 0, sigma)) # response
X = do.call('rbind', X)
# Pairwise Fusion strength hyperparameters (tau(k,k'))
# Same for all pairs in this example
G = matrix(1, k, k)
# Generate block diagonal matrices
transformed.data = generateBlockDiagonalMatrices(X, y, groups, G)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.