\begin{abstract} Patient reported outcomes including quality of life assessments are increasingly being included as either primary or secondary outcomes in randomized controlled trials. While making the outcomes more relevant for patients it entails a challenge in cases where death or a similar event makes the outcome of interest undefined. A pragmatic -- and much used -- solution is to assign diseased patient with the lowest possible quality of life score. This makes medical sense, but creates a statistical problem since traditional tests such as the t-test or the Wilcoxon test potentially loses high amounts of statistical power. In this paper we propose a novel test that can keep the medically relevant composite outcome but preserve full statistical power. The test is also applicable in other situations where a specific value (say zero days alive outside hospitals) encodes a special meaning. The test is implemented in an R package, and the paper includes a simulation study and an application to a COVID-19 randomized controlled trial. \end{abstract}

\begin{center} \textbf{Keywords:} Quality of Life, Randomized Controlled Trials,\ Truncated Outcomes, Semi-Continuous Distributions, Empirical Likelihood \end{center}

Introduction

In medical intervention research including in particular randomized controlled trials (RCTs) there is a trend towards increased use of patient reported outcomes; Quality of Life scores being one of the most prominent of these. Established statistical practice is to compare these between treatment groups using non-parametric tests such as Mann–Whitney–Wilcoxon rank-sum test (@wilcoxon3001968individual, @mann1947test), henceforth Wilcoxon, since distributions are rarely normal or symmetric \textbf{[TODO: References needed to current statistical practice]}. When all patients are alive and able and willing to provide quality of life scores at the scheduled measurement time this procedure works very well. Not all clinical settings, however, will have all participants alive at time of scheduled assessment of quality of life This is particularly true in trials within intensive care where mortality of approximately 30% is not uncommon. This is the setting that motivated this work. A pragmatic solution which is gaining popularity is to simply give the deceased patients the lowest possible score and then use a Wilcoxon test or similar to compare the groups \textbf{[TODO: References needed]}. This paper demonstrates that this approach can lead to a dramatic loss of statistical power, and we will introduce an alternative statistical test statistic that averts this power loss. Our proposed test procedure is implemented in an R package and made publicly available.

The key-insight in the proposed test is to incorporate that the outcome is actually a two-dimensional outcome of a very special type, and that the constructed combined outcome follows a continuous-singular mixture distribution. This unusual distribution is why one cannot resort to non-parametric Wilcoxon tests since the singular component in the distribution of the combined outcome will get reduced to simple ties. It is noted that the handling of ties in standard statistical software varies and is opaque. However, the handling of ties is not the main reason why the Wilcoxon test suffers power loss. The main reason is that the null hypothesis in these Wilcoxon type tests (stochastic domination) does not handle the empirical fact that treatments might influence mortality and quality of life differently.

As an alternative we propose to model the binary component (i.e., survival) and the continuous part (i.e., actual quality of life) separately but to construct a single test for no treatment effect on either based on a likelihood ratio. We can thus provide a single p-value for the hypothesis of no treatment effect on the extended quality of life where death is given the lowest possible score. To accommodate potential non-normality of the recorded quality of life scores we consider both a parametric and a semi-parametric version of the test where the semi-parametric version is as widely applicable as the Wilcoxon test. Both test procedures will provide effect estimates of mean differences between treatment groups based on the combined outcome along with confidence intervals. This is also an added benefit compared to the Wilcoxon-type testing approach.

It should be noted that while we in this paper exemplify the procedure using quality of life and mortality the method is applicable in any setting where a single value of a combined outcome has a special interpretation compared to an otherwise continuous scale. Another example from intensive care research consider the outcome "days alive and out of hospital within 90 days from randomization". Here in-hospital fatalities will all have the value 0, while everybody else will have outcomes ranging from 0 to 90. Such outcomes are also routinely analyzed using Wilcoxon-type tests. Our proposed test would increase power while also providing a mean effect estimate along with confidence interval. Note also that the developed mathematical method allows straightforwardly for the inclusion of confounding variables or other adjustment factors. The method itself is therefore just as applicable in non-randomized studies or epidemiological studies in general.

The rest of the paper is structured as follows. The next section introduces the mathematical setup as well as our novel test procedure. Section 3 describes the R implementation, and Section 4 presents a simulation study illustrating the substantial power gains. Section 5 re-analyzes a COVID-19 trial from 2021, and section 6 concludes with a discusses. Additional simulation results and mathematical proofs are contained in the supplementary material.

Method {#sec:method}

We consider random variables $(Y, A, R, X)$ where $Y$ is the continuous outcome, $A$ is a binary variable being equal to $1$ if $Y$ is observed and equal to $0$ if $Y$ is unobserved/undefined, $R$ is a binary treatment indicator, and $X$ is a $p$-dimensional vector of additional covariates. The objective is to focus on the bivariate distribution of $(Y \mid A = 1), A \mid R, X$ under the primary null hypothesis of no treatment effect given by $(Y \mid A = 1), A \perp R \mid X$. Even though we model the outcome as a bivariate random variable, a combined outcome is often used in practice. The combined outcome is derived such that it is equal to $Y$ if $A = 1$ (and $Y$ is observed), and otherwise (when $A = 0$) it is equal to some predetermined value. The combined outcome can be written as \begin{align} \widetilde{Y} = Y \mathbbm{1}(A = 1) + \mathcal{E} \mathbbm{1}(A = 0)\label{eq:combinedOutcome} \end{align} where $\mathcal{E}$ is a fixed atom assigned as the outcome value when $Y$ is unobserved, and $\mathbbm{1}$ denotes the indicator function. The semi-continuous distribution of $\widetilde{Y}$ is therefore a probabilistic mixture of a singular distribution at $\mathcal{E}$ and a continuous distribution over the domain of the random variable $Y \mid A = 1$. Thus the statistical challenge can be rephrased as assert whether the treatment affects the distribution of $\widetilde{Y}$.

The conditional expectation of the combined outcome in Equation (\ref{eq:combinedOutcome}) is given by \begin{align} E[\widetilde{Y} \mid R, X] = E[Y \mid R, X, A=1]P(A = 1 \mid R, X) + \mathcal{E}P(A = 0 \mid R, X) \end{align} From this expression it can be seen that a treatment comparison expressed in terms of a contrast of the expectation of $\widetilde{Y}$ can be zero even though the distribution of $(Y \mid A=1), A \mid X$ depends on $R$. To see this, let $E[Y \mid A=1, R = 0, X] = \mu(X)$ and $E[Y \mid A=1, R = 1, X] = \mu(X) + \mu_{\delta}(X)$ and assume without loss of generalization that $\mathcal{E} = 0$. Then the average treatment effect for the combined outcome conditional on baseline covariates is \begin{align} \Delta(X) &= E[\widetilde{Y} \mid R = 1, X] - E[\widetilde{Y} \mid R = 0, X]\label{eq:Delta}\ &= \mu(X)\left[P(A = 1 \mid R=1,X) - P(A = 1 \mid R = 0, X)\right] + \mu_{\delta}(X) P(A = 1 \mid R = 1, X)\nonumber \end{align} If we auspiciously let $\mu_{\delta}(X) = \mu(X)\left[P(A = 1 \mid R = 0, X) P(A = 1 \mid R = 1, X)^{-1} - 1\right]$ then $\Delta(X) = 0$ for all values of $X$ even though $\mu_{\delta}(X) \ne 0$ and $P(A = 1 \mid R = 0, X) \ne P(A = 1 \mid R = 1, X)$. On the other hand, if $\mu_\delta(X) = 0$ and $P(A = 1 \mid R=1,X) = P(A = 1 \mid R = 0, X)$ then $\Delta(X)$ is necessarily equal to zero. This illustrates that a significance test of no treatment effect must have two degrees of freedom. Such a test which we develop in the subsequent sections is conceptually difference than a test for the null hypothesis $\Delta(X) = 0$.

We propose to test the null hypothesis of no treatment effect on the combined outcome by a likelihood ratio test of the joint distribution of $Y_i \mid A_i$ and $A_i$. This has the advantage that it increases the efficiency compared to the Wilcoxon test and it yields a single p-value appropriate for testing a primary outcome in a clinical trial. Let $(Y_i, A_i, R_i, X_i)$ for $i = 1, \ldots, n$ be independent and identically distributed random variables. We can write our model in a general form as \begin{align} Y_i \mid A_i = 1, R_i, X_i &\sim F(Y_i; \mu_i(R_i, X_i), \Psi)\ A_i \mid R_i, X_i &\sim \text{Bernoulli}(A_i; \pi_i(R_i, X_i))
\end{align
} for some mean functions $\pi_i$ and $\mu_i$ and a distribution function $F$ characterizing the distribution of the observed continuous outcomes with possible nuisance parameters $\Psi$. The joint likelihood function for the combined outcome conditional on treatment and baseline covariates is \begin{align} L_n(\mu_i(R_i, X_i), \pi_i(R_i, X_i), \Psi) &= \prod_{i=1}^n f(Y_i; \mu_i(R_i, X_i), \Psi)^{A_i}P(A_i = a_i; \pi_i(R_i, X_i))\ &= \prod_{i=1}^n f(Y_i; \mu_i(R_i, X_i), \Psi)^{A_i} \pi_i(R_i, X_i)^{A_i} (1 - \pi_i(R_i, X_i))^{1-A_i}\nonumber \end{align} We note that when $F$ has an absolutely continuous density function, $f$, the likelihood function can equivalently be written solely in terms of the combined outcome $\widetilde{Y}$ in Equation (\ref{eq:combinedOutcome}) since $f(Y_i; \cdot)^{A_i} = f(\widetilde{Y}i; \cdot)^{\mathbbm{1}(\widetilde{Y}_i \ne \mathcal{E})}$ and $\pi_i(\cdot)^{A_i} = \pi_i(\cdot)^{\mathbbm{1}(\widetilde{Y}_i \ne \mathcal{E})}$ almost surely. An important property of this model is that the likelihood function factorizes into two components as \begin{align} L_n(\mu_i(R_i, X_i), \pi_i(R_i, X_i)) &= L_{1,n}(\mu_i(R_i, X_i), \Psi) L_{2,n}(\pi_i(R_i, X_i)) \end{align} where $L{1,n}(\mu_i(R_i, X_i), \Psi) = \prod_{i=1}^n f(Y_i; \mu_i(R_i, X_i), \Psi)^{A_i}$ and $L_{2,n}(\pi_i(R_i, X_i)) = \prod_{i=1}^n \pi_i(R_i, X_i)^{A_i} (1 - \pi_i(R_i, X_i))^{1-A_i}$. This implies that the parameters for the observed outcomes, $\mu_i(R_i, X_i)$, can be estimated independently of the parameters governing the probability of observing the outcome, $\pi_i(R_i, X_i)$. This factorization is a consequence of the likelihood construction and it does neither assume nor require independence between the value of the outcome and the probability of observing it.

Under the assumption of generalized linear models for $\mu_i(R_i, X_i)$ and $\pi_i(R_i, X_i)$ we may parameterize the conditional mean functions as \begin{equation} \begin{aligned} \E[Y_i \mid A_i = 1, R_i, X_i] &= \mu_0 + \mu_{\delta} R_i + s_\mu(X_i)\label{eq:glm1}\ \logit P(A_i = 1 \mid R_i, X_i) &= \beta_0 + \beta_{\delta} R_i + s_\beta(X_i) \end{aligned} \end{equation} where the treatment effect is quantified by the bivariate contrast $(\mu_\delta, \beta_\delta)$ and $s_\mu, s_\beta\colon\, \mathbb{R}^p \mapsto \mathbb{R}$ are some models for the additional covariates. The parameter $\mu_\delta$ is interpreted as the expected difference among the observed outcomes and $\beta_\delta$ is correspondingly the log odds-ratio of being observed.

To assess the treatment effect on the combined outcome we propose a test statistic based on the likelihood ratio \begin{equation} \begin{aligned} W_n(\mu_0, \mu_\delta, s_\mu, \beta_0, \beta_\delta, s_\beta) &= -2 \log \frac{\sup\limits_{\Psi} L_{1,n}(\mu_0, \mu_\delta, s_\mu, \Psi) L_{2,n}(\beta_0, \beta_\delta, s_\beta)}{\sup\limits_{\mu_0, \mu_\delta, s_\mu, \Psi} L_{1,n}(\mu_0, \mu_\delta, s_\mu, \Psi) \sup\limits_{\beta_0, \beta_\delta, s_\beta} L_{2,n}(\beta_0, \beta_\delta, s_\beta)}\label{eq:lrtgeneral}\ &= W_{1,n}(\mu_0, \mu_\delta, s_\mu) + W_{2,n}(\beta_0, \beta_\delta, s_\beta) \end{aligned} \end{equation} where \begin{align} W_{1,n}(\mu_0, \mu_\delta, s_\mu) &= -2\left(\log \sup\limits_{\Psi} L_{1,n}(\mu_0, \mu_\delta, s_\mu, \Psi) - \log \sup\limits_{\mu_0, \mu_\delta, s_\mu, \Psi} L_{1,n}(\mu_0, \mu_\delta, s_\mu, \Psi)\right)\ W_{2,n}(\beta_0, \beta_\delta, s_\beta) &= -2\left(\log L_{2,n}(\beta_0, \beta_\delta, s_\beta) - \log \sup\limits_{\beta_0, \beta_\delta, s_\beta} L_{2,n}(\beta_0, \beta_\delta, s_\beta) \right) \end{align} are the likelihood ratio test statistics for each of the components. As a test statistic we use the profile likelihood ratio defined as \begin{equation} \begin{aligned} W_n^P(\mu_\delta, \beta_\delta) &= \sup\limits_{\mu_0, s_\mu} W_{1,n}(\mu_0, \mu_\delta, s_\mu) + \sup\limits_{\beta_0, s_\beta} W_{2,n}(\beta_0, \beta_\delta, s_\beta) \label{eq:LRTtotal}\ &= W_{1,n}^P(\mu_\delta) + W_{2,n}^P(\beta_\delta) \end{aligned} \end{equation} which is only a function of the two treatment contrasts. Specifically, for the null-hypothesis of no treatment effect the test statistic $W_n^P(0, 0)$ is used.

In the following we will consider both a parametric and a semi-parametric version of our profile likelihood ratio. The difference between the two are that the parametric test assumes a specific distributional form for $W_{1,n}^P(\mu_\delta)$, e.g., for $Y_i \mid A_i = 1, R_i, X_i$ the continuous part of the model, while the semi-parametric version is based on a profile empirical likelihood ratio and does not require any distributional assumptions.

If we combine the model for the conditional expectation of observed outcomes in Equation (\ref{eq:glm1}) with the assumption of normality we obtain the following sub-model \begin{align} Y_i \mid A_i = 1, R_i, X_i \sim N\left(\mu_0 + \mu_\delta R_i + s_\mu(X_i), \sigma^2\right) \end{align} which gives rise to a parametric form of $L_{1,n}$ and thereby $W_{1,n}^P(\mu_\delta)$. Plugging the latter into Equation (\ref{eq:LRTtotal}) yields our proposed parametric likelihood ratio test. We note any parametric model other than the normal distribution can also be used.

A direct consequence of Wilks' theorem [@wilks1938large] is the large sample distribution of $W_n^P(0, 0)$ under a parametric model for $W_{1,n}^P(\mu_\delta)$ is given in the proposition below.

\begin{proposition} Assume that $P(A_i = 1 \mid R_i = r, X_i) > 0$ for $r = 0,1$ and let $m_j = \sum_{i=1}^n \mathbbm{1}(R_i = j)$. Then the parametric profile likelihood ratio test statistic $W_n^P(0,0)$ stated in Equation (\ref{eq:LRTtotal}) for the null hypothesis of no treatment effect on the combined outcome is asymptotically $\chi^2$ distributed with two degrees of freedom for $m_1, m_2 \rightarrow \infty$ and $m_1/m_2 \rightarrow c$ for some finite, positive constant $c$. \end{proposition}

We note that given specific models for $s_\mu(X_i)$ and $s_\beta(X_i)$ this parametric likelihood ratio test is particularly simple to calculate using standard statistical software. One must then estimate two linear regression models and two logistic regression model both with and without the binary treatment indicator as a covariate and then add their individual log likelihood differences according to Equation (\ref{eq:LRTtotal}). The p-value for the null hypothesis of no treatment effect can then be calculated by the $\chi^2$-distribution with two degrees of freedom. The critical value for rejecting the null hypothesis of no treatment effect at the 5% level is thus equal to $5.99$ and equal to $9.21$ at the 1% level.

A potential drawback of the parametric likelihood ratio test is that it relies on a distributional model for the observed outcomes. In this section we introduce a more flexible approach based on an empirical likelihood ratio test. This approach also utilizes the additive decomposition of the likelihood ratio test statistic in Equation (\ref{eq:LRTtotal}) but substitutes the term $W_{1,n}$ with a term that is free of any distributional assumptions. Combining this with the binomial model for $A \mid R_i, X_i$ in the $W_{2,n}$ leads to a semi-parametric likelihood ratio test for the overall treatment effect.

Let $\mathcal{I}_0 = \left{i = 1,\ldots, n : R_i = 0, A_i = 1\right}$ and $\mathcal{I}_1 = \left{i = 1,\ldots, n : R_i = 1, A_i = 1\right}$ be the sets of indices of the observed outcomes for the two treatment groups. As derived in the supplementary material, the empirical likelihood ratio test statistic as a function of the difference in expected value is given by \begin{align} W_{1,n}^E(\mu_\delta^\ast) &= 2\sup_{\mu}\left(\sum_{i \in \mathcal{I}0}\log \left(1 + \lambda_1 (Y_i - \mu)\right) + \sum{j \in \mathcal{I}1}\log\left(1 + \lambda_2 (Y_j - \mu - \mu\delta^\ast)\right)\right) \end{align} where $\lambda_1$ and $\lambda_2$ are the solutions to the following equations \begin{align} |\mathcal{I}0|^{-1}\sum{i \in \mathcal{I}0} \frac{Y_i - \mu}{1 + \lambda_1 (Y_i - \mu)} = 0, \quad |\mathcal{I}_1|^{-1}\sum{j \in \mathcal{I}1}\frac{Y_j - \mu - \mu\delta^\ast}{1 + \lambda_2 (Y_j - \mu - \beta_\delta^\ast)} = 0 \end{align} We refer to the supplementary material for a derivation of this test statistics.

The semi-parametric likelihood ratio test statistic for testing the null hypothesis of no treatment effect is equal to $W_n^{SP}(0, 0)$ where \begin{align} W_n^{SP}(\mu_\delta^\ast, \alpha_\delta^\ast) = W_{1,n}^E(\mu_\delta^\ast) + W_{2,n}(\alpha_\delta^\ast) \label{eq:LRT_SP_total} \end{align} By the non-parametric Wilk's theorem [@owen1988empirical] it follows that $W_n^{SP}(0, 0) \sim \chi^2_2$ asymptotically.

\begin{proposition} Assume that $P(A_i = 1 \mid R_i = r) > 0$ for $r = 0,1$ and let $m_j = \sum_{i=1}^n 1_{R_i = j}$. Then the semi-parametric profile likelihood ratio test statistic $W_n^{SP}(0,0)$ stated in Equation (\ref{eq:LRT_SP_total}) for the null hypothesis of no treatment effect on the combined outcome is asymptotically $\chi^2$ distributed with two degrees of freedom for $m_1, m_2 \rightarrow \infty$ and $m_1/m_2 \rightarrow c$ for some finite, positive constant $c$. \end{proposition}

We note that when there are no additional covariates to adjust for (i.e., no $X_i$), as will some times be the case in randomized controlled trials, our semi-parametric likelihood ratio test is actually a fully non-parametric method.

Confidence intervals for the treatment effect using either the parametric or semi-parametric test procedure are readily obtained by inverting the likelihood ratio test statistic presented in general in Equation (\ref{eq:LRTtotal}) utilizing the duality between hypothesis testing and confidence intervals. Specifically, an $(1-\alpha)100\%$ confidence region contains all values of the treatment contrast that cannot be rejected by either the parametric or semi-parametric likelihood ratio test at level $\alpha$. This simultaneous bivariate confidence region for the treatment effect is given by the following point set in $\mathbb{R}^2$ \begin{align} \text{CI}^{(\mu_\delta, \beta_\delta)}_{1-\alpha} = \left{(m, b) \in \mathbb{R}^2 : W_n^P(m, b) \leq \chi_2^2(1-\alpha)\right} \end{align} and it will asymptotically contain the true difference in means among the observed outcomes and the true log odds-ratio of being observed simultaneously with probability $(1-\alpha)100\%$. Similarly, univariate confidence intervals for $\mu_\delta$ and $\beta_\delta$ can be calculate by inversion with respect to a $\chi^2$ distribution with one degree of freedom as \begin{align} \text{CI}^{\mu_\delta}{1 - \alpha} = \left{m \in \mathbb{R} : W{1,n}^P(m) \leq \chi_1^2(1-\alpha)\right}, \quad \text{CI}^{\beta_\delta}{1 - \alpha} = \left{b \in \mathbb{R} : W{2,n}^P(b) \leq \chi_1^2(1-\alpha)\right} \end{align} We recommend illustrating the result of the treatment effect and its confidence through the simultaneous confidence region $\text{CI}^{(\mu_\delta, \beta_\delta)}_{1-\alpha}$ as it is capable of conveying the probable magnitudes of the effect of the treatment in both components of the combined outcome simultaneously. Examples will be given below.

Software implementation

To facilitate a straightforward application of our approach we have implemented both the parametric and semi-parametric likelihood ratio tests in the R package \texttt{TruncComp} which is available at the GitHub repository [@TruncCompGitHub]. We illustrate its applicability based on an example data set also available from the package.

The example data set can be loaded by writing \texttt{data("TruncCompExample")} after loading the package. The data set contains two variables, $Y$ and $R$, where $Y$ is the outcome and $R$ is the binary treatment indicator. There are $25$ observations in each treatment group, and truncated observations in $Y$ have been assigned the atom $\mathcal{E} = 0$. Figure \ref{fig:dataExampleHistogram} shows histograms of the outcome for each treatment group. Visually there appears to be a difference between the two groups both in terms of the frequency of the atom at zero and a location shift in the continuous part.

library(TruncComp)
data("TruncCompExample")

```rHistograms of the outcome for the example data stratified by treatment group.", fig.pos="htbp"} par(mfrow=c(1,2), mgp=c(0.5,0.5,0), mar=c(3.5, 2, 1, 0.5)) hist(subset(TruncCompExample, R == 0)$Y, xlab="", xlim=c(0,10), main=expression(Y ~ "|" ~ R == 0), ylim=c(0,14), col="gray70", border="gray90", cex.axis=0.7, cex.main=0.8, cex.lab=0.8, ylab="") hist(subset(TruncCompExample, R == 1)$Y, xlab="", xlim=c(0,10), main=expression(Y ~ "|" ~ R == 1), ylim=c(0,14), col="gray70", border="gray90", cex.axis=0.7, cex.main=0.8, cex.lab=0.8, ylab="")

The observed difference in means for the combined outcome, $\Delta$ in Equation (\ref{eq:Delta}), is `r round(mean(TruncCompExample$Y[TruncCompExample$R == 1])  - mean(TruncCompExample$Y[TruncCompExample$R == 0]), 3)` and both a two-sample t-test and a Wilcoxon rank sum test show highly insignificant effects of the treatment with p-values of `r round(t.test(Y ~ R, data = TruncCompExample)$p.value, 3)` and `r round(wilcox.test(Y ~ R, data = TruncCompExample)$p.value, 3)` respectively. In order to analyse the data using proposed method we use the function \texttt{truncComp} as follows
```r
model <- truncComp(Y ~ R, atom = 0, data = TruncCompExample, method = "SPLRT")

where the formula interface is similar to other regression models implemented in R. The argument \texttt{atom} identifies the value assigned to the truncated outcomes, and \texttt{method} can be \texttt{SPLRT} or \texttt{LRT} for either the semi-parametric or the parametric likelihood ratio test respectively. In this example we have opted for the semi-parametric version to show how easily this additional flexibility is included. We obtain the results of the estimation by a calling the function \texttt{summary} on the model:

summary(model)

The output from the call to \texttt{summary} displays the estimates for the two treatment contrasts corresponding to $\mu_\delta$ and $\exp(\alpha_\delta)$ in Equations (\ref{eq:glm1}) and (\ref{eq:glm2}). These contrasts quantify the difference in means among the observed outcomes and the odds ratio of being observed respectively in accordance with the model specification. Each estimated treatment contrast is accompanied by a 95\% confidence interval, and finally the output displays the joint likelihood ratio test statistic and the associated $p$-value for the joint null hypothesis of no treatment effect.

From the output we see that the semi-parametric likelihood ratio analysis reports an extremely low $p$-value for null hypothesis of no joint treatment effect. This strongly contradicts the conclusions from both the previous t-test and Wilcoxon test. The confidence intervals for the two treatment contrasts indicate that the average value for the observed outcome in the group defined by $R = 1$ is significantly higher than the average value for the observed outcome in the group with $R = 0$.

The confidence intervals can also be obtained by calling the function \texttt{confint} on the model object. This command reports both the marginal confidence intervals for the two treatment contrasts as well as a their simultaneous confidence region. To obtain the simultaneous confidence region we write

confint(model, type = "simultaneous", plot = TRUE, offset = 1.4, resolution = 50)

where \texttt{resolution} is the number of discrete grid points on which the likelihood surface is evaluated for plotting. Figure \ref{fig:simultConfidence} shows a heat-map of the semi-parametric likelihood surface as well as the $95\%$ simultaneous confidence region. The point $(0,0)$ is far outside of the joint confidence region which corresponds to the strong rejection of the joint null hypothesis of no treatment effect.

```rSimultaneous empirical likelihood ratio surface for the two treatment contrasts.", fig.pos="htbp", cache = TRUE} par(mgp=c(1.6,0.6,0), cex.axis=0.8, cex.lab=0.8, bty="n") suppressMessages(confint(model, type="simultaneous", plot=TRUE, offset = 1.4, resolution = 50)) abline(v = 0, lty=2) abline(h = 0, lty=2)

# Simulation study
To illustrate the power benefit and small sample properties of our proposed procedure we consider four different scenario through a simulation study. In all scenarios it is assumed that treatment (denoted $R_i$) is randomized one to one between two groups, and we let the observed combined outcome be given by the product $\widetilde{Y}_i = A_i Y_i$, where $A_i$ is binary and $Y_i$ is continuously distributed. This is equivalent to observing the atom $\mathcal{E} = 0$ when $A_i = 0$ and otherwise $Y_i$ is observed.

Table \ref{tab:simulation} shows the distributions of $A_i \mid R_i$ and $Y_i \mid A_i = 1, R_i$ for each of the four simulation scenarios. Scenario 1 represents the case where the point mass at zero is independent of the treatment but with a continuous component that is shifted between the groups. Scenario 2 is the opposite of scenario 1 where the continuous component is independent of the treatment but the probability of the point mass at zero differs between treatment groups. In scenario 3 we model a treatment effect on both components but in opposite directions of each other, and scenario 4 is similar to scenario 1 but where the continuous component is non-normal distributed.

\begin{table}[htb]
\centering
\begin{tabular}{r|r|r}
Scenario & $A_i \mid R_i$ & $Y_i \mid A_i = 1, R_i$\\ \hline
1 & $\Bernoulli(0.35)$ & $N(3 + 0.5R_i, 1)$\\
2 & $\Bernoulli(0.5 + 0.15R_i)$ & $N(3.5, 1)$\\
3 & $\Bernoulli(0.4 - 0.1R_i)$ & $N(3 + 0.5 R_i, 1)$\\
4 & $\Bernoulli(0.35)$ & $\Beta(1, 1-0.7R_i)$
\end{tabular}
\caption{Power simulation scenarios}
\label{tab:simulation}
\end{table}

To assess the power we vary the sample size between 50 and 350 observations in each group with increments of 25, and for each scenario we perform 25,000 Monte Carlo simulations and estimate the power as the proportion of rejected null hypotheses at the 5\% level. We compare the estimated power for both our parametric and semi-parametric likelihood ratio tests to the two-sample t-test and the Wilcoxon test. Figure \ref{fig:powerCurves} shows the estimated power functions. The results are also presented in table form in the supplementary material.

In all scenarios except number two it is clear that we observe a large power gain compared to both the t-test and the Wilcoxon test.

In setup 1 we observe a large power gain compared to the Wilcoxon test. Here the Wilcoxon tests gets "confused" by the large number of ties in the atom. In setup 2 Wilcoxon test has slightly better power profile. This is to be expected as our novel test is here disadvantaged by being a two-degrees-of-freedom test where the Wilcoxon is only one. It is further observed that Wilcoxon as expected as very little power when the effects on mortality and among survivors are of opposite sign despite the two distributions being markedly different indicating clear treatment effect (setups 3 and 4). In contrast our novel methods has excellent power. In all settings with a normally distributed outcome among survivors the parametric and semi-parametric approaches are similar, but for the heavy tail setup (no. 4) the semi-parametric approach is clearly superior. 



```rSimulated power as a function of sample size for each of the four scenarios.", fig.pos="htbp"}
rm(list=ls())
load("simulations/simulationResults.RData")
cols <- c("firebrick1", "forestgreen", "cornflowerblue", "black")

par(mfrow=c(2,2), bty="n", mgp=c(1.6,0.6,0), mar=c(3,3,2,0))
plot(nSeq, power1[,"Wilcoxon"], ylim=c(0,1), type="l", col = cols[4],
     xlab="# observations pr group", ylab="Power", lwd = 2,
     cex.axis=0.85, cex.lab=1.1)
title("Scenario 1")
lines(nSeq, power1[,"T-test"], col = cols[3], lwd = 2)
lines(nSeq, power1[,"LRT"], col = cols[1], lwd = 2)
lines(nSeq, power1[,"SPLRT"], col = cols[2], lwd = 2)
legend("topleft", c("Parametric LRT", "Semi-parametric LRT", "t-test", "Wilcoxon"),
       lwd = 2, lty=1, col=cols, bty="n", cex=0.75)

plot(nSeq, power2[,"Wilcoxon"], ylim=c(0,1), type="l", col = cols[4],
     xlab="#observations pr group", ylab="Power", lwd = 2,
     cex.axis=0.85, cex.lab=1.1)
title("Scenario 2")
lines(nSeq, power2[,"T-test"], col = cols[3], lwd = 2)
lines(nSeq, power2[,"LRT"], col = cols[1], lwd = 2)
lines(nSeq, power2[,"SPLRT"], col = cols[2], lwd = 2)

plot(nSeq, power3[,"Wilcoxon"], ylim=c(0,1), type="l", col = cols[4],
     xlab="#observations pr group", ylab="Power", lwd = 2,
     cex.axis=0.85, cex.lab=1.1)
title("Scenario 3")
lines(nSeq, power3[,"T-test"], col = cols[3], lwd = 2)
lines(nSeq, power3[,"LRT"], col = cols[1], lwd = 2)
lines(nSeq, power3[,"SPLRT"], col = cols[2], lwd = 2)

plot(nSeq, power4[,"Wilcoxon"], ylim=c(0,1), type="l", col = cols[4],
     xlab="#observations pr group", ylab="Power", lwd = 2,
     cex.axis=0.85, cex.lab=1.1)
title("Scenario 4")
lines(nSeq, power4[,"T-test"], col = cols[3], lwd = 2)
lines(nSeq, power4[,"LRT"], col = cols[1], lwd = 2)
lines(nSeq, power4[,"SPLRT"], col = cols[2], lwd = 2)

In addition to the power simulations we also performed a simulation study for the type I error. We considered two scenarios similar to scenarios 1 and 4 in Table \ref{tab:simulation} under a null hypothesis of no treatment effect. Specifically, we considered $A_i \sim \Bernoulli(\pi)$, $Y_i \mid A_i = 1 \sim N(3,1)$ and $A_i \sim \Bernoulli(\pi)$, $Y_i \mid A_i = 1 \sim \Beta(1, 1)$. We estimated the type I error through Monte Carlo simulation for different sample sizes and $\pi \in \left{0.2, 0.4, 0.6, 0.8\right}$. The results are presented in the supplementary material.

Application

library(TruncComp)
d <- read.csv("./covidData.csv", sep=";")

m_LRT <- truncComp(dawols28 ~ allocation, atom = 0, data = d, method="LRT")
m_SPLRT <- truncComp(dawols28 ~ allocation, atom = 0, data = d, method="SPLRT")
sCI <- suppressMessages(TruncComp:::confint.TruncComp(m_SPLRT, 
                                     type="simultaneous", 
                                     offset = 2, 
                                     resolution = 50, 
                                     plot = FALSE))

The COVID-STEROID 2 trial was a multi-center study comparing daily dose 6 mg (low) of dexamethasone for up to 10 days vs. 12 mg (high) in patients with severe and critical COVID-19. The full description of the study including all results can be found in @10.1001/jama.2021.18295. The primary outcome of the trial was days alive and without out use of life-support in the period from randomization to day 28. In other words, that outcome is an integer between 0 and 28 with a substantial peak at day 0 corresponding to the atom $\mathcal{E}$ coming from the patients who are never taken off life-support systems. The analyses in the main publications (@10.1001/jama.2021.18295) were conducted using the test proposed in this paper. In the JAMA publication the test is being referred to as the Kryger Jensen and Lange test. In this section we re-analyze the sub-group consisting of patients in need of invasive mechanical ventilation at baseline (n = 206). These can generally be taken to be the most severely ill patients.

The distribution of the outcome in each randomization group is presented by histograms in Figure \ref{fig:application} (left and middle panels). In our notation the outcome (days alive without life support) is re-coded such that $A$ takes the value zero if the outcome is zero or one otherwise. $Y$ is said to be the observed number of days without life support.

rHistograms of days alive without life support from the The COVID-STEROID 2 trial stratified by treatment dosis (left, middle) and simultaneous confidence regions based on the semi-parametric likelihood ratio test (right).", fig.pos="htbp"} par(mfrow=c(1,3), mgp=c(2.2,1,0), mar=c(3.2, 3.5, 2, 0)) hist(subset(d, d$allocation)$dawols28, xlim=c(0,30), main="Low dose", ylim=c(0,55), 10, col="gray70", border="gray90", cex.axis=1.2, cex.main=1.5, cex.lab=1.5, ylab="", xlab="Days alive without life support") hist(subset(d, !d$allocation)$dawols28, 10, xlim=c(0,30), main="High dose", ylim = c(0,55), col="gray70", border="gray90", cex.axis=1.2, cex.main=1.5, cex.lab=1.5, ylab="", xlab="Days alive without life support") image(sCI$muDelta, sCI$logORdelta, sCI$surface, col = rev(fields::tim.colors(256)), useRaster = TRUE, xlab="Difference in means", ylab="log OR", cex.axis=1.2, cex.main=1.5, cex.lab=1.5) points(m_SPLRT$muDelta, log(m_SPLRT$alphaDelta), pch = 19, cex = 1.5) contour(sCI$muDelta, sCI$logORdelta, sCI$surface, add = TRUE, levels = stats::qchisq(0.99, 2), lwd = 1, labels = 0.99, lty=1) contour(sCI$muDelta, sCI$logORdelta, sCI$surface, add = TRUE, levels = stats::qchisq(0.95, 2), lwd = 1, labels = 0.95, lty=1) contour(sCI$muDelta, sCI$logORdelta, sCI$surface, add = TRUE, levels = stats::qchisq(0.8, 2), lwd = 1, labels = 0.8, lty=1) contour(sCI$muDelta, sCI$logORdelta, sCI$surface, add = TRUE, levels = stats::qchisq(0.5, 2), lwd = 1, labels = 0.5, lty=1) title("Confidence regions", cex.main=1.5) abline(v = 0, lty=2) abline(h = 0, lty=2)

Comparing the groups by a Wilcoxon test yields a p-value of $0.242$ hence leading to a non-significant conclusion at the pre-specified 5\% level. Using our proposed parametric likelihood ratio test yields a p-value of $0.019$ and therefore leading to a rejection of the null hypothesis. From the histograms it could be questioned whether the continuous part of the distributions are normal distributed. Accordingly, we also apply our proposed semi-parametric version of the likelihood ratio test. This yields a p-value of $0.021$. In conclusion, our proposed method is able to detect the difference between the groups. Further and in contrast to the Wilcoxon test, our method also provides a way to interpret the difference between the groups. This is illustrated in the right panel of Figure \ref{fig:application} showing simultaneous confidence regions for the two effect parameters. In this case it is clear that the primary effect of the intervention is not on the discrete component (i.e., the proportion of patients who never get off life support). Instead, the effect appears to be on the continuous part and is between half a day and seven days.

Discussion

In this paper we have introduced a novel statistical test to assess treatment effect on continuous outcomes where one value has special meaning (e.g., all diseased are assigned lowest possible value). The procedure in potentially much more power-full than the current best-practice which is to use Wilcoxon-type tests. The proposed method includes both a fully parametric approach and a semi-parametric where the latter makes no assumptions on the distribution of the continuous part of the outcome. In all settings this new method not only provides an effect measure but also effect parameters with associated confidence intervals. The test is implemented in an R package available on GitHub.

The simulation study and in particular the real data example demonstrate the efficiency and power gain. It is noted that unlike the Wilcoxon test, our proposed method can easily be extended to include covariates [@owen1991empirical]. It is therefore not only useful in RCT settings but also to epidemiological studies.

\textbf{[TODO: Do we need something about this is very useful when needing to prespecify a test in an RCT?]}

\textbf{[TODO: check verb tense]}

References {-}



aejensen/TruncComp documentation built on Feb. 8, 2023, 3:42 p.m.