knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
knitr::include_graphics("hex-hce.png")
Load the package hce and check the version:
library(hce) packageVersion("hce")
For citing the package run citation("hce") [@hce].
Hierarchical composite endpoints (HCE) are a class of endpoints that combine multiple clinical outcomes into a single composite while preserving the distinct nature of each outcome. In a particular case, outcomes within a fixed follow-up period are ranked by clinical importance, and the patient’s most important outcome is used for analysis. HCEs are typically analyzed using win odds and related win statistics [@gasparyanhierarchical].
A straightforward example of an HCE is an endpoint defined on an ordinal scale assessed at a specific timepoint. For instance, the COVID-19 HCE uses an 8-category ordinal scale to evaluate physical limitations in hospitalized patients with COVID-19 at 15 or 30 days post-treatment. See the COVID-19 and COVID-19b dataset ordinal outcomes as examples [@beigel2020remdesivir].
table(COVID19)
Simple HCEs are implemented in the hce package via the hce class, which inherits from data.frame. These objects must include the ordinal analysis values in the AVAL column and a two-level treatment group in the TRTP column. The hce::summaryWO() function can be used to provide the number of wins, losses, and ties by category. From these counts, one can compute the probability of ties.
COVID19HCE <- hce(GROUP = COVID19$GROUP, TRTP = COVID19$TRTP) SUM0 <- summaryWO(COVID19HCE, ref = "Placebo") SUM <- SUM0$summary SUM$Ptie <- round(SUM$TIE/SUM$TOTAL, 2) SUM
While the win odds and its 95% confidence interval are calculated as follows:
WO <- calcWO(COVID19HCE, ref = "Placebo")[, c("WO", "LCL", "UCL")] round(WO, 2)
A more complex case arises when multiple events per patient are observed. Consider two binary outcomes: death and hospitalization. Each patient can experience up to two events (e.g., hospitalization followed by death), and the HCE must account for the hierarchy between them (typically prioritizing death over hospitalization) when determining wins, losses, and ties.
set.seed(1) n <- 100 dat0 <- data.frame(TRTP = rep(c("A", "P"), each = n), DEATH = c(rbinom(n = n, size = 1, prob = 0.6), rbinom(n = n, size = 1, prob = 0.8)), HOSP = c(rbinom(n = n, size = 1, prob = 0.5), rbinom(n = n, size = 1, prob = 0.8)))
Based on these outcomes, several methods can be used to construct hierarchical composite endpoints. Below are two approaches with their implementations.
Select a single outcome per patient—the highest-priority one. If a patient experienced death, select death (regardless of hospitalization). If no death occurred, select hospitalization (only if it did not lead to subsequent death). Otherwise, select no outcome. This yields three categories: I. Death, II. Hospitalization (without subsequent death), III. No outcome.
This can be implemented as:
dat1 <- dat0 dat1$AVAL <- ifelse(dat1$DEATH == 1, 1, ifelse(dat1$HOSP == 1, 2, 3)) dat1 <- as_hce(dat1) summaryWO(dat1)$summary
Compare patients first on death. The active (control) patient wins if they are alive while the patient in the control (active) group died. If both patients died, it is a tie on death and the comparison proceeds to the next outcome (hospitalization), using the same winner logic. If no winner is determined after evaluating hospitalization, declare a tie (there are no further outcomes to break ties).
In this case, we can also derive a single patient-level categorical variable that encodes the above comparison logic. From worst to best, the categories are: patients with both hospitalization and death; patients with death without prior hospitalization; patients with hospitalization without death; and patients with no events. This can be implemented using the following ordering:
dat2 <- dat0 dat2$ORD <- 2*dat2$DEATH + dat2$HOSP unique(dat2[, c("DEATH", "HOSP", "ORD")])
The multiplicative factor of 2 is used purely for ordering and can be replaced by any monotone transformation. Here, a higher ORD indicates a more prioritized (worse) outcome—i.e., outcomes that would cause the patient to lose in a pairwise comparison. Because the hce package convention is that higher values denote better outcomes, we simply reverse the ordering in ORD to obtain the analysis variable AVAL:
dat2$AVAL <- max(dat2$ORD) - dat2$ORD dat2 <- as_hce(dat2) unique(dat2[, c("DEATH", "HOSP", "ORD", "AVAL")]) summaryWO(dat2)$summary
The number of ties is more than halved, leading to a larger estimated treatment effect. This occurs because, in this case, the second method more efficiently breaks ties by sequentially comparing outcomes (death first, then hospitalization), reducing indeterminate pairings.
Here we provide examples of HCE using in clinical trials from different therapeutic areas. General considerations for creating HCEs can be found in @gasparyan2022design, @khce1.
The kidney HCE defined in @khce2 has the following ordinal outcomes (for the review of the topic see @khce1).
HCE2 <- data.frame(Order = c("I", "II", "III", "IV", "V", "VI", "VII"), Category = c("Death", "Dialysis or kidney transplantation", "Sustained GFR < 15 ml/min per 1.73 m2", "Sustained GFR decline from baseline of >= 57%", "Sustained GFR decline from baseline of >= 50%", "Sustained GFR decline from baseline of >= 40%", "Individual GFR slope") ) HCE2
The dataset KHCE contains data on a kidney HCE outcomes
dat <- KHCE Order <- c("Death (adj)", "Chronic dialysis (adj) >=90 days", "Sustained eGFR<15 (mL/min/1.73 m2)", "Sustained >=57% decline in eGFR", "Sustained >=50% decline in eGFR", "Sustained >=40% decline in eGFR", "eGFR slope") dat$GROUP <- factor(dat$GROUP, levels = Order) table(dat$GROUP, dat$TRTP)
This dataset is derived from ADSL which contains baseline characteristics, ADLB laboratory measurements of kidney function, and ADET for the time-to-event outcomes with their timing. For the detailed derivation see the Technical Appendix in @khce2.
The DARE-19 [@kosiborod2021effects;@kosiborod2021dapagliflozin] trial used an HCE to assess outcomes in patients hospitalized for COVID-19 and treated for 30 days. The COVID-19 HCE is presented below. It combines death, in hospital organ dysfunction events with clinical status at Day 30 for patients alive, still hospitalized but without previous organ dysfunction events, and hospital discharge as the most favorable outcome for patients discharging without organ dysfunction events and being alive at Day 30.
Below a higher category signifies a better outcome. Patients are ranked into one and only one category based on their clinically most severe event. For example, patients experiencing an in-hospital new or worsening organ dysfunction event then dying will be included in the category I.
HCE <- data.frame(Order = c("I", "II", "III", "IV", "V"), Category = c("Death", "More than one new or worsened organ dysfunction events", "One new or worsened organ dysfunction event", "Hospitalized at the end of follow-up (Day 30)", "Discharged from hospital before Day 30") ) HCE
Patients in the category I are compared using the timing of the event, with an earlier event being a worse outcome (are assigned a lower rank). Similarly, in the category III the timing of the event is used for ranking patients within this category. In the category II patients are compared using the number of events with a higher number signifying a worse outcome. Patients in the category IV - hospitalized at the end of follow-up without previous worsening events - are further ranked according to oxygen support requirements at the hospital (IV.1 on high flow oxygen devices, IV.2 requiring supplemental oxygen, IV.3 not requiring supplemental oxygen, with a higher rank being a better outcome). Patients in the category V are compared using the timing of the event, but, the hospital discharge being a favorable outcome, here the earlier event signifies a better outcome than the late event (reverse of the ranking in categories I and III).
In the Heart Failure population (see @kondo2023use) the following HCE was considered
HCE3 <- data.frame(Order = c("I", "II", "III", "IV"), Category = c("Cardiovascular death", "Total (first and recurrent) HF hospitalizations", "Total urgent HF visits", "Improvement/deterioration in KCCQ-TSS")) HCE3
This section describes the function simKHCE(), which simulates a kidney HCE composed of six components, evaluated in the following order:
Kidney Failure Replacement Therapy (KFRT)
Sustained eGFR < 15
Sustained $\geq 57\%$ decline in eGFR
Sustained $\geq 50\%$ decline in eGFR
Sustained $\geq 40\%$ decline in eGFR
Change in eGFR
simKHCE( n, CM_A, CM_P = -4, n0 = n, TTE_A = 1000, TTE_P = TTE_A, fixedfy = 2, Emin = 20, Emax = 100, sigma = NULL, Sigma = 3, m = 10, theta = -0.4605, phi = 0, two_meas = c("no", "base", "postbase", "both") )
The simulation assumes a fixed follow-up interval $[0, T]$, where $T$ (in years) is set by fixedfy (fixed follow-up years). Each patient has $m$ equally spaced visits over this interval. The active arm has sample size $n;$ the control arm has size $n_0$. Measurements are generated from a linear mixed-effects model with a independent random intercepts and slopes:
$$y_{ij}=\alpha_i+{\beta_i + [b^1 - (\beta_i)^-\varphi]x_i}t_j+\varepsilon_{ij}, \ \ t_j=\frac{T}{m}j,\,j=0,1,\cdots,m,\,i=1,\cdots,N=n_0+n.$$
Here, $x_i$ is the active-arm indicator (1 for active, 0 for control).
Baseline eGFR: True baselines are sampled as $\alpha_i\sim {\mathcal U}[E_{\min}, E_{\max}].$
Random slope: For the placebo group, $\beta_i\sim {\mathcal N}(b^0, \Sigma^2),$ where the mean placebo slope $b^0$ is given by CM_P. The negative part is $(\beta_i)^-=\max(-\beta_i, 0).$ For active-arm patients with a negative slope, treatment combines multiplicative and additive effects:
$$\beta_i + [b^1 - (\beta_i)^-\varphi]x_i=\beta_i (1-\varphi)+b^1.$$
For patients with a positive slope, the treatment effect is purely additive:
$$\beta_i + [b^1 - (\beta_i)^-\varphi]x_i=\beta_i +b^1.$$
The additive effect $b^1$ is specified in CM_A, and the proportionality coefficient $\varphi\in[0,1]$ is set by phi. The between-patient slope variability is Sigma (i.e., $\Sigma$).
sigma is provided, it sets the within-patient standard deviation (SD). By default (sigma = NULL), the SD is time-varying and depends on the predicted eGFR:$$\sigma_{ij}=\sqrt{0.67*\textit{predicted eGFR}}=\sqrt{0.67(\alpha_i+{\beta_i + [b^1 - (\beta_i)^-\varphi]x_i}t_j)}.$$
Let $dt=t_j-t_{j-1}=\frac{T}{m}.$ The cumulative hazard is $$\Lambda_i(t)=\sum_{j=1}^{k-1}dt \lambda_{ij}+\lambda_{ik}(t - t_k), \ \ t\in[t_{k-1}, t_k].$$ If $U_i\sim{\mathcal U}[0, 1]$ drawn per patient, the event time $X_i$ satisfies $S(X_i)=\exp(-\Lambda_i(X_i))=U_i,$ yielding, $$X_i = \frac{-\log(U_i)-\sum_{j=1}^{k-1}dt \lambda_{ij}}{\lambda_{ik}}+t_k.$$ Note: a single uniform variate is sampled per patient, not per visit.
The parameters $c_0$ and $c_1$ control treatment-independent and treatment-dependent components of the baseline hazard. Let TTE_P be the placebo time-to-event scaling and TTE_A the active-arm scaling. Then:
By default TTE_A=TTE_P, so $c_1=0$, meaning no treatment effect on KFRT beyond its effect through eGFR. Thus, treatment reduces KFRT risk only by slowing eGFR decline.
Calibration of TTE and theta: TTE_A and theta are set so that the event rate is approximately 1 per patient-year at eGFR = 15 and 0.01 per patient-year at eGFR = 25. This yields TTE_A=1000 and theta=-0.4605.
Additional constraints based on observed data: In the current setup, KFRT risk depends on the true eGFR trajectory. In practice, however, the observed eGFR plays an important role in initiating KFRT. To reflect this, event rates are further adjusted based on observed eGFR: when observed eGFR is above 30, the patient’s event rate is set to a very low value (10E-7); when observed eGFR is at or below 7, the event rate is set to a very high value (10E5).
To model dependent outcomes, several methods are available:
Joint Distribution Modeling Using Copulas: This method employs copulas to model the joint distribution of outcomes, capturing their dependence.
Random Frailty Modeling: This approach captures patient-level dependence between outcomes using a random frailty model.
Conditional Distribution Specification Through Multi-State Modeling: This technique uses multi-state models to describe the conditional distribution of outcomes.
Sklar's theorem [@sklar1959fonctions] shows that multivariate distribution functions can be expressed using a copula and univariate distributions. For a random vector $X^d=(X_1,\cdots,X_d)$ with a multivariate distribution function $H(x_1,\cdots,x_d),$ Sklar's theorem states that there is a copula $C(\cdot)$ such that: $$H(x_1,\cdots,x_d)=C(F_1(x_1),\cdots,F_d(x_d)),$$ where each component $X_j$ has the univariate distribution $F_j.$ A copula is essentially a multivariate distribution function where each univariate marginal distribution is uniform, describing the dependency structure of the multivariate distribution function $H(\cdot)$. To construct the multivariate distribution function, one combines each variable's univariate distributions $F_j$ with the copula.
If $X_j$ has distribution function $F_j,$ then $U_j=F_j(X_j)$ is uniformly distributed, allowing random variables $X_j\sim F_j$ to be simulated by generating uniform random variables $U_j$ and applying the inverse transformation:
$$F_j^{-1}(y)=\inf{x\in {\mathbf R}: \ \ F_j(x)\geq y}, \ \ \inf\varnothing=\infty.$$ Thus, if one has simulated a uniform random vector $U^d=(U_1,\cdots, U_d)$ from the copula $C(\cdot)$, the random vector $X^d=(X_1,\cdots,X_d)$ can be simulated as: $$(X_1,\cdots,X_d)=(F_1^{-1}(U_1),\cdots,F_d^{-1} (U_d)).$$ The main challenge remains in simulating from the given copula.
An Archimedean copula [@nelsen2006introduction] is one where:
$$C(u^d;\varphi)=\varphi(\varphi^{-1}(u_1)+\cdots+\varphi^{-1}(u_d)).$$ The function $\varphi:[0,+\infty]\rightarrow [0,1]$ is a generator - continuous, decreasing, with $\varphi(0)=1$ and $\lim_{t\rightarrow+\infty}\varphi(t)=0.$ When $\varphi(t)=e^{-t^\theta},\ \ \theta>1,$ the copula is called a Gumbel copula.
By Bernstein's theorem, completely monotone Archimedean generators coincide with Laplace-Stieltjes transforms of distribution functions $F,$ determined by $\varphi=LS[F].$ The Marshall-Olkin algorithm [@marshall1988families] for sampling from an Archimedean copula involves:
The vector $U=(U_1,\cdots, U_d)$ is then a random vector from the Archimedean copula with generator $\varphi$. For the Gumbel copula, one needs to use the inverse Laplace-Stieltjes transform of a stable distribution [@nolan2020univariate; @hofert2011nested]: $$F\sim S(1/\theta, 1, \cos^\theta(\pi/(2\theta)), {\mathbf I}_{{\theta=1}},1)$$ modifying the first step to sample from a stable distribution.
Chambers-Mallows-Stuck method [@chambers1976method] efficiently simulates stable variables with:
Defining $\alpha=1/\theta,$ setting $b_{\tan}=\beta\tan\left(\frac{\alpha\pi}{2}\right),$ and $\theta_0=\arctan(b_{\tan})/\alpha,$ with $$C_{\tan}=(1+b_{\tan}^2)^\frac{1}{2\alpha}.$$
Utilizing the transformations: $$Z(\theta) = \frac{\sin(a_0) C_{\tan}}{\cos(\Theta)^\frac{1}{\alpha}}\left(\frac{\cos(a_0-\Theta)}{W}\right)^\frac{1-\alpha}{\alpha}, \ \ a_0=\alpha(\Theta+\theta_0),\,\theta>1.$$ $$Z(1)=\frac{2}{\pi}\left(\pi_\beta\tan(\Theta)-\beta\log\left(\frac{\pi}{2}W\frac{\cos(\Theta)}{\pi_\beta}\right)\right), \ \ \pi_\beta=\frac{\pi}{2}+\beta\Theta.$$ Finally, $\gamma Z + \delta$ has the desired distribution with $\gamma =[\cos(\pi/(2\theta))]^\theta$ and $\delta={\mathbf I}_{{\theta=1}}$ (and one needs to set $\beta=1$).
In the original Chambers-Mallows-Stuck formula, the term $$\frac{1}{[\cos(\alpha\theta_0)\cos(\Theta)]^\frac{1}{\alpha}}$$ is replaced by
$\frac{C_{\tan}}{[\cos(\Theta)]^\frac{1}{\alpha}},$
as suggested by the copula package [@hofert2025copula], which is based on the fact that $C_{\tan} = 1/(\cos(\alpha\theta_0))^{1/\alpha}.$ Indeed, one needs to show that
$$(1+b_{\tan}^2)^\frac{1}{2\alpha}=1/(\cos(\alpha\theta_0))^{1/\alpha},$$ which is equivalent to showing that $$1+\left[\beta\tan\left(\frac{\alpha\pi}{2}\right)\right]^2=\frac{1}{[\cos(\alpha\theta_0)]^2}.$$ And this is true because of the trigonometric identity $$1+[\tan(y)]^2=\frac{1}{[\cos(y)]^2}.$$ Here we have set $y=\alpha\theta_0=\arctan(b_{\tan})=\arctan\left(\beta\tan\left(\frac{\alpha\pi}{2}\right)\right)$ and hence $\tan(y)=\beta\tan\left(\frac{\alpha\pi}{2}\right).$
The function simHCE() provides the implementation above with the argument theta specifying the dependence of outcomes.
Rates_A <- c(10, 20) Rates_P <- c(20, 20) dat1 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1, seed = 1) dat2 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 1.0001, seed = 1) dat3 <- simHCE(n = 2500, TTE_A = Rates_A, TTE_P = Rates_P, CM_A = -3, CM_P = -6, CSD_A = 15, fixedfy = 3, theta = 10, seed = 1) calcWO(dat1) calcWO(dat2) calcWO(dat3)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.