hsm: Solves proximal operator of latent group lasso in...
In yanxht/hsm: A Path-Based BCD for Proximal Function of Latent Group Lasso

Description Usage Arguments Details Value References See Also Examples

Solves proximal operator of the latent group Lasso appearing in Yan & Bien (2017)

min_β || y - β ||_2^2 + lam * Ω(β; w)

where Ω(β; w) = min_{sum_l v^(l) = β; supp(v^(l)) \subset g_l} w_l * || v^(l) ||_2 is known as the latent group lasso penalty as defined in Jacob et al. (2009). In the problem, β is a length-p parameter vector and its elements are embedded in a directed acyclic graph (DAG). The desired sparsity pattern is a subgraph of the DAG such that if β_i embedded in node i are set to zero, all the parameters embedded in the descendant nodes of i are zeroed out as well. The problem is solved by breaking down the DAG into several path graphs for which closed-form solutions are available for the proximal operator corresponding with each path graph, and performing block coordinate descent across the path graphs. See Section 4.3 of the paper for more details and explanations.

1 2	hsm(y, lam, w = NULL, map, var, assign = NULL, w.assign = NULL, get.penalval = FALSE, tol = 1e-08, maxiter = 10000, beta.ma = NULL)

`y`	Length-`p` vector used in proximal operator.
`lam`	Non-negative tuning parameter that controls the sparsity level.
`w`	Length-`n.nodes` vector of positive values for which `w_l` gives the weight for `g_l`, where `n.nodes` is the number of nodes in DAG. If this is `NULL`, `w_l = sqrt(\|g_l\|)` will be used. Necessary condition is `w_l` increases with `\|g_l\|`.
`map`	Matrix of `n.edges`-by-`2` dimension, where `n.edges` is the number of directed edges in DAG. The first column has indices of nodes that edges directing from, whereas the second column gives the indices of nodes the corresponding edges directing towards. If a node indexed `i` does not have edges linked to it, record the corresponding row as `map[i, NA]`.
`var`	Length-`n.nodes` list for which the `l`th element contains the indices of variables embedded in the `l`th node.
`assign`	Matrix of `p` columns that gives the assignments of variables over different path graphs. Each row of `assign` corresponds to a path graph decomposed from DAG. If this is NULL, `hsm` first break down DAG into different path graphs, and then give value to `assign` afterwards, based on `map` and `var`; otherwise, `map` and `var` are ignored. Refer to `paths` for more details.
`w.assign`	List of length `nrow(assign)`, for which the `l`th element contains the weights corresponding to the `l`th row of `assign` (the `l`th path graph). For example, if the `l`th path graph is made up of three nodes indexed with `{3, 4, 6, 8}`, `w.assign[[l]] = {w_3, w_4, w_6, w_8}`. If this is NULL, `hsm` will give value to `w.assign`, along with `assign`; otherwise, `map` and `var` are ignored. Refer to `paths` for more details.
`get.penalval`	If `TRUE`, lam Ω(β; w)* are computed and returned, otherwise `NA` is returned.
`tol`	Tolerance level used in BCD. Convergence is assumed when no parameter of interest in each path graph changes by more than tol in BCD.
`maxiter`	Upperbound of the number of iterations that BCD to perform.
`beta.ma`	`n.paths`-by-`p` matrix of initialization of beta value in the `n.paths` path graphs decomposed from DAG. Do not use unless you know the decomposition of DAG.

See Section 2.2 of the paper for problem setup and group structure specifications. See Figure 7 in Section 4.3 for an example of decomposing DAG into path graphs. See Algorithm 4 in paper for details of the path-based BCD.

Returns an estimate of the solution to the proximal operator of the latent group Lasso. The returned value is an exact solution if the DAG is a directed path graph.

`beta`	A length-`p` vector giving solution to the proximal operator defined above.
`ite`	Number of cycles of BCD performed.
`penalval`	Value of the penalty lam Ω(β; w)* if `get.penalval` is `TRUE`, otherwise `NA`.
`assign`	Value of `assign`.
`w.assign`	Value of `w.assign`.
`beta.ma`	`n.paths`-by-`p` matrix of decomposed beta values for all the decomposed path graphs. The beta values are from the last iteration in `hsm`.

Yan, X. and Bien, J. (2017). Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations. Statist. Sci. 32, no. 4, 531–560. doi:10.1214/17-STS622.

Jacob, L., Obozinski, G. and Vert, J. (2009). Group Lasso with Overlap and Graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning. ICML'09 433-440. ACM, New York.

hsm.path

paths

# The following example appears in Figure 7 of Yan & Bien (2015).
# Generate map defining DAG.
map <- matrix(0, ncol=2, nrow=8)
map[1, ] <- c(1, 2)
map[2, ] <- c(2, 7)
map[3, ] <- c(3, 4)
map[4, ] <- c(4, 6)
map[5, ] <- c(6, 7)
map[6, ] <- c(6, 8)
map[7, ] <- c(3, 5)
map[8, ] <- c(5, 6)
# Assume one parameter per node.
# Let parameter and node share the same index.
var <- as.list(1:8)
set.seed(100)
y <- rnorm(8)
result <- hsm(y=y, lam=0.5, map=map, var=var, get.penalval=TRUE)

# Another example in which DAG contains two separate nodes
map <- matrix(0, ncol=2, nrow=2)
map[1, ] <- c(1, NA)
map[2, ] <- c(2, NA)
# Assume ten parameters per node.
var <- list(1:10, 11:20)
set.seed(100)
y <- rnorm(20)
lam <- 0.5
result <- hsm(y=y, lam=lam, map=map, var=var, get.penalval=TRUE)
# The solution is equivalent to performing group-wise soft-thresholdings
beta.st <- c(y[1:10] * max(0, 1 - lam * sqrt(10) / norm(y[1:10], "2")),
          y[11:20] * max(0, 1 - lam * sqrt(10) / norm(y[11:20], "2")))
all.equal(result$beta, beta.st)