SeqSGPV: Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

Two-arm randomized trial comparing differences in mean of normal outcome

Sys.info()[c("sysname","release","version","machine")]

                                                                                                 sysname 
                                                                                                "Darwin" 
                                                                                                 release 
                                                                                                "22.6.0" 
                                                                                                 version 
"Darwin Kernel Version 22.6.0: Mon Feb 19 19:48:53 PST 2024; root:xnu-8796.141.3.704.6~1/RELEASE_X86_64" 
                                                                                                 machine 
                                                                                                "x86_64"

library(SeqSGPV)
nreps <- 50000

An implementation scientist wishes to test whether a phone app can help adults 18+ in an urban city to reduce their sodium intake. At enrollment a baseline measure of sodium intake will be assessed and then participants will be randomized 1:1 to receive the phone app. The estimand of interest is the mean difference in 3-month sodium intake, $\Delta$, between the two arms for the city’s population. Assuming loss-to-follow-up is not related to treatment assignment, the trial will compare the observed 3-month sodium intake, adjusted for baseline, between treatment arms.

In previous studies, the outcome has been observed to be heavily skewed, similar to a dgamma(shape=2,scale=sqrt(0.5)) distribution, and a greater mean difference reflects greater fidelity in the intervention arm. The outcome standard deviation is 1.

Without incorporating scientific relevance, a traditional hypothesis could be:

H0: $\Delta$ $\le$ 0 H1: $\Delta$ > 0

However, practically equivalent effects to the null are up to 0.075. The minimally scientifically meaningful effect is 0.5. The PRISM is defined by ROE$_{(0.075, 0.50)}$.

The investigator wants a Type I error $\le$ 0.025 when $\Delta = 0$.

The investigators say the study can afford up to 300 participants, though a maximum of 150 participants would be ideal. The investigators would like to know the design-based average sample size, Type I error, and Power across a range of treatment effects. The investigator prefers to monitor for meaningful effects using a 95% confidence interval.

Logistically, the outcome takes 1 month to observe, and the planned accrual is 25 participants a month. The team wishes to monitor outcomes monthly. Hence, there could be 75 delayed outcomes at the point of evaluation. If the accrual is slower, then the number of delayed outcomes could be lower.

To inform the wait time, the investigator would want the expected width of the confidence interval to be, approximately, no more than 1 . Assuming normality, the calculate the desired wait time to be:

ciwidth <- 1
2 * ceiling((2 * 1.96 * sqrt(2) / ciwidth)^2)

[1] 62

If they were to start after the second month of observed outcomes (i.e., 50 outcomes), the expected CI width would be:

2 * 1.96 * sqrt(2 / (50/2))

[1] 1.108743

The study team is satisfied with this to inform their wait time until observing outcomes.

To benchmark a maximum sample size, they perform power calculations for a single look study:

# Benchmark power for single-look design
power.t.test(power = 0.8, delta=.5, sig.level = 0.025, alternative = "one.sided")

     Two-sample t test power calculation

              n = 63.76576
          delta = 0.5
             sd = 1
      sig.level = 0.025
          power = 0.8
    alternative = one.sided

NOTE: n is number in *each* group

A SeqSGPV trial design to monitor PRISM is set with $W=25$, $S=\{1, 25\}$, $N=\{100, 150, 300, Inf\}$, and $A=\{0, 25\}$.

# 2 sample trial with skewed outcomes and monitoring assuming normality (95% CIs)
# Hypotheses uninformed by scientific context
# H0: mu > 0
# H1: mu < 0
# PRISM: deltaG1 = 0.075, deltaG2 = 0.50
# Assess outcomes monthly -- 25 participants per month
# Possible delayed outcomes -- 0, 25, 50, 75
# Maximum sample size -- 100, 150, 300, Inf
system.time(PRISM <-  SeqSGPV(nreps            = nreps,
                              dataGeneration   = rgamma, dataGenArgs = list(n=300,shape=2,scale=.5),
                              effectGeneration = 0, effectGenArgs=NULL,  effectScale  = "identity",
                              allocation       = c(1,1),
                              effectPN         = 0,
                              null             = "less",
                              PRISM            = list(deltaL2 = NA,      deltaL1 = NA, 
                                                      deltaG1 = 0.075,   deltaG2 = 0.5),
                              modelFit         = lmCI,
                              modelFitArgs     = list(miLevel=.95),
                              wait             = 50,
                              steps            = c(1,25),
                              affirm           = c(0, 25),
                              lag              = c(0, 75),
                              N                = c(150, 225, 300),
                              printProgress    = FALSE))

   user  system elapsed 
895.748  59.487 322.828

# Note: This step is typically done after evaluating operating characteristics
# under the point null. It will be shown again later.
# This step is done here for the sake of saving an Rmd cache with
# minimal retained data (after removing the simulated date).

# Obtain design under range of effects
se <- round(seq(-0.1, 0.7, by = 0.05),2)
system.time(PRISMse <- fixedDesignEffects(PRISM, shift = se))

[1] "effect: -0.1"
[1] "effect: -0.05"
[1] "effect: 0"
[1] "effect: 0.05"
[1] "effect: 0.1"
[1] "effect: 0.15"
[1] "effect: 0.2"
[1] "effect: 0.25"
[1] "effect: 0.3"
[1] "effect: 0.35"
[1] "effect: 0.4"
[1] "effect: 0.45"
[1] "effect: 0.5"
[1] "effect: 0.55"
[1] "effect: 0.6"
[1] "effect: 0.65"
[1] "effect: 0.7"

     user    system   elapsed 
14374.661  1822.341  5385.047

# This next step is not required but is done for reducing the size of the cache
#  saved when re-running rmd.
PRISM$mcmcMonitoring <- NULL

Assess the impact of delayed outcomes and additional control for using an affirmation step.

par(mfrow=c(2,2))
# Impact of delayed outcomes
plot(PRISM,stat = "lag.rejH0", affirm=0,  steps=25)
plot(PRISM,stat = "lag.n",     affirm=0,  steps=25)
plot(PRISM,stat = "lag.rejH0", affirm=25, steps=25)
plot(PRISM,stat = "lag.n",     affirm=25, steps=25)

The affirmation step helps control the Type I error rate. In this example, Type I error is controlled under 0.025 with $A=0$. However, in this example, if $S=1$, there would be further need for the affirmation step to control the Type I error rate.

par(mfrow=c(2,2))
# Impact of delayed outcomes
plot(PRISM,stat = "lag.rejH0", affirm=0,  steps=1)
plot(PRISM,stat = "lag.n",     affirm=0,  steps=1)
plot(PRISM,stat = "lag.rejH0", affirm=25, steps=1)
plot(PRISM,stat = "lag.n",     affirm=25, steps=1)

The affirmation step helps control the Type I error rate to be below 0.025.

Under, $\Delta=0$, we can see the empirical CDF of the sample size. Below, is the ECDF for study with an unrestricted sample size.

par(mfrow=c(1,2))
plot(PRISM$mcmcECDFs$mcmcEndOfStudyEcdfN$W50_S25_A0_L75_N300,las=1, 
     main = "Sample Size ECDF\nmu = 0, S=25, A=0, L=75, N=300")
plot(PRISM$mcmcECDFs$mcmcEndOfStudyEcdfN$W50_S25_A25_L75_N300,las=1, 
     main = "Sample Size ECDF\nmu = 0, S=25, A=25, L=75, N=300")

Having established Type I error control, we can further evaluate operating characteristics under a range of plausible outcomes. To be conservative the affirmation step of $A=25$ will be used.

This next step was run previously but is shown here again as it is when the step would more naturally take place.

# Obtain design under range of effects
se <- round(seq(-0.1, 0.7, by = 0.05),2)
system.time(PRISMse <- fixedDesignEffects(PRISM, shift = se))

par(mfrow=c(2,2))
plot(PRISMse, stat = "lag.rejH0", steps = 25, affirm = 25,  N = 150, lag = 75)
plot(PRISMse, stat = "lag.n",     steps = 25, affirm = 25,  N = 150, lag = 75)
plot(PRISMse, stat = "lag.bias",  steps = 25, affirm = 25,  N = 150, lag = 75, ylim=c(-.1,.1))
plot(PRISMse, stat = "lag.cover", steps = 25, affirm = 25,  N = 150, lag = 75, ylim=c(0.93, 0.97))

par(mfrow=c(2,2))
plot(PRISMse, stat = "lag.stopNotROPE",       steps = 25, affirm = 25,  N = 150, lag = 75)
plot(PRISMse, stat = "lag.stopNotROME",       steps = 25, affirm = 25,  N = 150, lag = 75)
plot(PRISMse, stat = "lag.stopInconclusive",  steps = 25, affirm = 25,  N = 150, lag = 75)

Effects in the ROE are the most likely to end inconclusively, and there is low probability for effects in ROWPE and ROME to end inconclusively. The investigator could use the SGPVs for ROWPE and ROME to suggest whether to further investigate the intervention (see example interpretation #3).

The ECDF of sample size for a given treatment effect can be evaluated.

par(mfrow=c(2,2))
plot(PRISMse$`effect1_-0.1`$mcmcECDFs$mcmcEndOfStudyEcdfNLag$W50_S25_A0_L0_N300,las=1, 
     main = "Sample Size ECDF\nmu = -0.1")
plot(PRISMse$`effect1_0.1`$mcmcECDFs$mcmcEndOfStudyEcdfNLag$W50_S25_A0_L0_N300,las=1, 
     main = "Sample Size ECDF\nmu = 0.1")
plot(PRISMse$`effect1_0.3`$mcmcECDFs$mcmcEndOfStudyEcdfNLag$W50_S25_A0_L0_N300,las=1, 
     main = "Sample Size ECDF\nmu = 0.3")
plot(PRISMse$`effect1_0.5`$mcmcECDFs$mcmcEndOfStudyEcdfNLag$W50_S25_A0_L0_N300,las=1, 
     main = "Sample Size ECDF\nmu = 0.5")

The estimated mean difference was 0.54 (95% confidence interval: 0.09, 0.98) which is evidence that the treatment effect is at least practically better than the null hypothesis (p$_{ROWPE}$ = 0) and the evidence for being scientifically meaningful (p$_{ROME}$ = 0.54).
The estimated mean difference was -0.17 (95% confidence interval: -0.81, 0.48) which is evidence that the treatment effect is not scientifically meaningful (p$_{ROME}$ = 0) and the evidence for being practically equivalent or worse than the point null is p$_{ROWPE}$=0.69.
The estimated mean difference was 0.31 (95% confidence interval: 0.07, 0.56) at the maximum sample size, which is inconclusive evidence to rule out practically null effects (p$_{ROWPE}$ = 0.01) and scientifically meaningful effects (p$_{ROME}$=0.12). There is more evidence that the effect is practically null than scientifically meaningful.

For each conclusion, the following clarification may be provided: Based on simulations with $W=50, S=25, A=25, N=150$ with a lag of observing 75 outcomes, the risk of bias is expected to be minimal and the 95% confidence interval has near correct coverage. Please refer to the figure of simulated design-based bias and coverage.

chipmanj/SeqSGPV documentation built on May 1, 2024, 10:38 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

chipmanj/SeqSGPV
Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

examples/two-arm-continuous/README.md
In chipmanj/SeqSGPV: Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

Two-arm randomized trial comparing differences in mean of normal outcome

Context

Example interpretations following SeqSGPV monitoring of PRISM:

R Package Documentation

Browse R Packages

We want your feedback!

chipmanj/SeqSGPV Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

examples/two-arm-continuous/README.md In chipmanj/SeqSGPV: Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

Two-arm randomized trial comparing differences in mean of normal outcome

Context

Example interpretations following SeqSGPV monitoring of PRISM:

R Package Documentation

Browse R Packages

We want your feedback!

chipmanj/SeqSGPV
Sequential Monitoring of Scientific Regions Using the Second Generation P-Value

examples/two-arm-continuous/README.md
In chipmanj/SeqSGPV: Sequential Monitoring of Scientific Regions Using the Second Generation P-Value