knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE )
The hierNest package implements penalized regression with a hierarchical nested parameterization designed for settings in which observations belong to a two-level group hierarchy — for example, Diagnosis Related Groups (DRGs) nested within Major Diagnostic Categories (MDCs) in electronic health record (EHR) data.
In health systems, patient populations are highly heterogeneous. A patient hospitalized for heart failure has very different clinical risk factors from one admitted for a traumatic injury. Fitting a single pooled model ignores this heterogeneity, while fitting fully separate models per DRG is statistically infeasible when subgroup sample sizes are small and the number of predictors is large.
hierNest resolves this tension by decomposing each covariate's effect into three interpretable, hierarchically nested components:
$$\beta_d = \mu + \eta^{M(d)} + \delta_d$$
where:
Pairing this reparameterization with structured regularization (lasso or overlapping group lasso) allows the model to borrow strength across related subgroups while remaining flexible enough to capture genuine heterogeneity.
The package provides two penalization methods:
hierNest-Lasso — applies a standard lasso penalty to the reparameterized coefficients, with penalty weights adjusted for subgroup sample size. This is implemented via a modified design matrix that can be passed directly to glmnet. It is fast and serves as a useful baseline.
hierNest-OGLasso — applies an overlapping group lasso penalty that additionally enforces effect hierarchy: an MDC-specific effect can be non-zero only if the corresponding overall effect is non-zero, and a DRG-specific effect can be non-zero only if the corresponding MDC effect is non-zero. This is the recommended method when the hierarchical structure is plausible.
This vignette covers:
hier_info input.hierNest.hierNest() at a fixed penalty.cv.hierNest().predict_hierNest().plot().Install the package from CRAN (when available) or from source:
# From CRAN install.packages("hierNest") # From source (development version) # install.packages("devtools") devtools::install_github("ZirenJiang/hierNest")
Load the package:
library(hierNest)
hier_info matrixAll hierNest functions require a hier_info argument that encodes the two-level hierarchy. This is an $n \times 2$ integer matrix where:
Subgroup indices must be globally unique across groups — two DRGs in different MDCs must have different integer codes.
# Illustration: 3 MDCs with 2 DRGs each (6 DRGs total) # # MDC 1 -> DRG 1, DRG 2 # MDC 2 -> DRG 3, DRG 4 # MDC 3 -> DRG 5, DRG 6 # # For an observation in MDC 2 / DRG 4: # hier_info[i, ] = c(2, 4)
The package ships with a built-in example dataset that illustrates the required data structure.
data("example_data") # Dimensions cat("X dimensions:", dim(example_data$X), "\n") cat("Y length: ", length(example_data$Y), "\n") cat("hier_info dimensions:", dim(example_data$hier_info), "\n") # Peek at hier_info head(example_data$hier_info) # Group sizes table_mdc <- table(example_data$hier_info[, 1]) cat("\nObservations per MDC group:\n") print(table_mdc) table_drg <- table(example_data$hier_info[, 2]) cat("\nObservations per DRG subgroup (first 10):\n") print(head(table_drg, 10)) # Outcome distribution (binary) cat("\nOutcome distribution:\n") print(table(example_data$Y))
hierNest() — fit at a single penalty levelhierNest() fits the full regularization path for a single pair of hyperparameters $(\alpha_1, \alpha_2)$. It is useful for exploration or when tuning parameters have already been selected.
library(hierNest) fit <- hierNest( x = example_data$X, y = example_data$Y, hier_info = example_data$hier_info, family = "binomial", asparse1 = 1, # weight for MDC-level group penalty asparse2 = 1 # weight for DRG-level subgroup penalty ) # The fit contains a lambda sequence and corresponding coefficient paths length(fit$lambda) dim(fit$beta) # rows = reparameterized coefficients, cols = lambda values
Key arguments:
| Argument | Description |
|---|---|
| x | $n \times p$ predictor matrix |
| y | Response (numeric for Gaussian, 0/1 or factor for binomial) |
| hier_info | $n \times 2$ group/subgroup index matrix |
| family | "gaussian" or "binomial" |
| asparse1 | $\alpha_1$: relative weight for the MDC-level overlapping group penalty |
| asparse2 | $\alpha_2$: relative weight for the DRG-level overlapping group penalty |
| nlambda | Length of the $\lambda$ sequence (default 100) |
| standardize | Whether to standardize predictors before fitting (default TRUE) |
| method | For now, only "overlapping" (hierNest-OGLasso, default) |
cv.hierNest() — cross-validated tuning parameter selectionIn practice, the penalty hyperparameters $\lambda$, $\alpha_1$, and $\alpha_2$ need to be chosen by cross-validation. cv.hierNest() wraps the fitting procedure with cross-validation and returns the optimal parameter combination.
The cvmethod argument controls how $\alpha_1$ and $\alpha_2$ are selected:
"general" — uses a single pair of $\alpha_1$ and $\alpha_2$ (supplied as scalars) and runs standard cross-validation over the $\lambda$ sequence only."grid_search" — searches a grid of $\alpha_1 \times \alpha_2$ pairs. The ranges are specified via asparse1 and asparse2 (each a length-2 vector giving the lower and upper bound), with asparse1_num and asparse2_num controlling the grid resolution."user_supply" — the user provides explicit paired vectors of $(\alpha_1, \alpha_2)$ values to evaluate.cv.fit <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", # stratify CV folds within subgroups cvmethod = "grid_search", asparse1 = c(0.5, 20), # search alpha_1 over [0.5, 20] asparse2 = c(0.05, 0.20), # search alpha_2 over [0.05, 0.20] asparse1_num = 3, # 3 x 3 = 9 grid points asparse2_num = 3, nlambda = 50, # lambda values per grid point pred.loss = "ROC" # maximize AUROC )
Note on
partition = "subgroup": This stratifies the cross-validation folds so that each fold contains observations from all DRG subgroups (to the extent possible). This avoids degenerate folds where a small subgroup is entirely absent from the training data.
cv.fit.simple <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", cvmethod = "general", asparse1 = 1, asparse2 = 0.1, nlambda = 100, pred.loss = "ROC" )
cv.fit.user <- cv.hierNest( x = example_data$X, y = example_data$Y, method = "overlapping", hier_info = example_data$hier_info, family = "binomial", partition = "subgroup", cvmethod = "user_supply", asparse1 = c(0.5, 1, 5, 10), # explicit (alpha1, alpha2) pairs asparse2 = c(0.05, 0.1, 0.1, 0.2), nlambda = 50 )
After running cv.hierNest(), several components of the returned object are of interest.
# Optimal tuning parameters selected by cross-validation cv.fit$lambda.min # lambda minimising CV loss cv.fit$min_alpha1 # alpha_1 selected cv.fit$min_alpha2 # alpha_2 selected # Number of non-zero coefficients in the reparameterized model # at the optimal lambda nnz <- sum(cv.fit$hierNest.fit$beta[, which(cv.fit$hierNest.fit$lambda == cv.fit$lambda.min)] != 0) cat("Non-zero reparameterized coefficients:", nnz, "\n")
The plot() method for cv.hierNest objects provides two heatmap visualizations.
Plotting with type = "coefficients" shows the estimated overall, MDC-specific, and DRG-specific components for each selected predictor, at the cross-validation optimal $\lambda$.
# Returns a plotly interactive heatmap p_coef <- plot(cv.fit, type = "coefficients") p_coef
Each row of the heatmap corresponds to one level in the hierarchy (overall mean, then each MDC, then each DRG within that MDC). Each column is a selected predictor. Blue cells indicate a positive contribution; red cells indicate negative. Rows with all zeros are suppressed.
Plotting with type = "Subgroup effects" reconstructs the total effect for each DRG subgroup: $\hat{\beta}_d = \hat{\mu} + \hat{\eta}^{M(d)} + \hat{\delta}_d$.
p_sub <- plot(cv.fit, type = "Subgroup effects") p_sub
This visualization is useful for comparing risk profiles across subgroups — subgroups with similar colors for a given predictor share similar risk processes for that covariate.
The statistical framework underlying hierNest is described in detail in:
Jiang, Z., Huo, L., Hou, J., Vaughan-Sarrazin, M., Smith, M. A., & Huling, J. D. (2026+). Heterogeneous readmission prediction with hierarchical effect decomposition and regularization.
For participant $i$ with DRG $D_i = d$, the logistic regression model is:
$$\text{logit}(\Pr(Y_i = 1 \mid X_i, D_i = d)) = X_i^\top \beta_d$$
The hierNest reparameterization decomposes $\beta_d$ as:
$$\beta_d = \mu + \eta^{M(d)} + \delta_d$$
This is encoded via a modified design matrix $X_H$ constructed as the Khatri–Rao (column-wise Kronecker) product of $X$ and the hierarchy indicator matrix $H = [\mathbf{1}_n \;;\; H_M \;;\; H_D]$.
The overlapping group lasso penalty on the grouped coefficient vector $\theta_j = {\mu_j, {\eta_{Mj}}{M}, {\delta{dj}}_d}$ for the $j$-th predictor is:
$$P_{\text{og}}(\theta) = \lambda \sum_{j=1}^{p} \left[ |\theta_j|2 + \sum{M \in \mathcal{M}} \left(\alpha_1 |\theta_{Mj}|2 + \alpha_2 |\eta{Mj}| \right) \right]$$
This penalty enforces a hierarchical zero pattern: $\mu_j = 0$ implies all $\eta_{Mj} = 0$ and $\delta_{dj} = 0$; $\eta_{Mj} = 0$ implies all $\delta_{dj} = 0$ for $d \in M$.
For method = "overlapping", the majorization-minimization (MM) algorithm described in Algorithm 1 of the paper is used, with sequential strong rules to skip inactive predictor groups and accelerate computation. The design matrix is stored as a sparse matrix to minimise memory requirements.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.