Use Case 05: Analysis of trials (including methods for analysing spillover)"
In CRTspat: Workflow for Cluster Randomised Trials with Spillover

The CRTanalysis() function is a wrapper for different statistical analysis packages that can be used to analyse either simulated or real trial datasets. It is designed for use in simulation studies of different analytical methods for spatial CRTs by automating the data processing and selecting some appropriate analysis options. It does not replace conventional use of these packages. Real field trials very often entail complications that are not catered for any of the analysis options in CRTanalysis() and it does not aspire to carry out the full analytical workflow for a trial. It can be used as part of a wider workflow. In particular the usual object output by the statistical analysis package constitutes the model_object element within the CRTanalysis object generated by CRTanalysis(). This can be accessed by the usual methods (e.g predict(), summary(), plot()) which may be needed for diagnosing errors, assessing goodness of fit, and for identifying needs for additional analyses.

Statistical Methods

The options that can be specified using the method parameter in the function call are:

method = "T" summarises the outcome at the level of the cluster, and uses 2-sample t-tests to carry out statistical significance tests of the effect, and to compute confidence intervals for the effect size. The t.test function in the stats package is used.
method = "GEE" uses Generalised Estimating Equations to estimate the efficacy in a model with iid random effects for the clusters. An estimate of the intracluster correlation (ICC) is also provided. This uses calls to the gee package.
method = "LME4" fits linear (for continuous data) or generalized linear (for counts and proportions) mixed models with iid random effects for clusters in lme4.
method = "MCMC" uses Markov chain Monte Carlo simulation in package jagsUI, which calls r-JAGS.
method = "INLA" uses approximate Bayesian inference via the R-INLA package. This provides functionality for geostatistical analysis, which can be used for geographical mapping of model outputs (as illustrated in . INLA spatial analysis requires a prediction mesh. This can be generated using CRTspat::new_mesh(). This can be computationally expensive, so it is recommended to compute the mesh just once for each dataset.

All these analysis methods can be used to carry out a simple comparision of outcomes between trial arms. Each offers different additional functionality, and has its own limitations (see Table 5.1). Some of these limitations are specific to the options offered within CRTanalysis(), which does not embrace the full range of options of the packages that are 'wrapped'. These are specified using the method argument of the function.

Table 5.1. Available statistical methods

| method | Package | What the CRTanalysis() implementation offers |Limitations (as implemented) | |----------|---------|------------------------------------------------|-----------------------------| | T| t.test | P-values and confidence intervals for efficacy based on comparison of cluster means | No analysis of spillover or degree of clustering | | GEE | gee | Interval estimates for efficacy | No analysis of spillover or degree of clustering | | LME4| lme4 | Analysis of spillover | No geostatistical analysis | | INLA | INLA | Analysis of spillover, geostatistical analysis and spatially structured outputs | Computationally intensive | | MCMC | jagsUI | Interval estimates for spillover parameters | Identifiability issues and slow convergence are possible |

For the analysis of proportions, the outcome in the control arm is estimated as: $\hat{p}{C} = \frac{1}{1 + exp(-\beta_1)}$, in the intervention arm as $\hat{p}{I} = \frac{1}{1 + exp(-\beta_1-\beta_2)}$, and the efficacy is estimated as $\tilde{E}{s} = 1- \frac{\tilde{p}{I}}{\tilde{p}_{C}}$ where $\beta_1$ is the intercept term and $\beta_2$ the incremental effect associated with the intervention.

summary("<analysis>"") is used to view the key results of the trial. To display the output from the statistical procedure that is called, try <analysis>$model_object or summary("<analysis>$model_object").

library(CRTspat)
example <- readdata("exampleCRT.txt")
analysisT <- CRTanalysis(example, method = "T")
summary(analysisT)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  T 
## Link function:  logit 
## Model formula:  arm + (1 | cluster) 
## No modelling of spillover 
## Estimates:       Control:  0.364  (95% CL:  0.286 0.451 )
##             Intervention:  0.21  (95% CL:  0.147 0.292 )
##                 Efficacy:  0.423  (95% CL:  0.208 0.727 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
##  
## P-value (2-sided):  0.006879064

analysisT$model_object

## 
##  Two Sample t-test
## 
## data:  lp by arm
## t = 2.9818, df = 22, p-value = 0.006879
## alternative hypothesis: true difference in means between group control and group intervention is not equal to 0
## 95 percent confidence interval:
##  0.2332638 1.2989425
## sample estimates:
##      mean in group control mean in group intervention 
##                 -0.5561662                 -1.3222694

Assessing model fit

The model = "LME4" option outputs the deviance of the model and the Akaike information criterion (AIC), which can be used to select the best fitting model. The deviance information criterion (DIC) and Bayesian information criterion (BIC) perform the same role for the Bayesian methods ("INLA", and "MCMC"). The comparison of results with cfunc = "X" and cfunc = "Z" is used to assess whether the intervention effect is likely to be due to chance. With method = "T", cfunc = "X" provides a significance test of the intervention effect directly. The models with spillover (see below) can be compared by that with cfunc = "X" to evaluate whether spillover has led to an important bias.

Spillover

CRTanalysis() provides options for analysing spillover effects either as function of a Euclidean distance or as a function of a surround measure:

Models that do not consider spillover

Models that do not consider spillover can be fitted using options Z and X. These are included both to allow conventional analyses (see above), and also to enable model selection using and likelihood ratio tests, the Akaike information criterion (AIC), deviance information criterion (DIC) or Bayesian information criterion (BIC) .

Spillover as a function of distance

These methods require a measure of distance from the boundary between the trial arms, with locations in the control arm assigned negative values, and those in the intervention arm assigned positive values. The functional forms for this relationship is specified by the value of cfunc (Table 5.2).

Table 5.2. Available spillover functions

| cfunc | Description | Formula for $P\left( d \right)$ | Compatible method(s) | |---------|------------------|--------------------------|--------------------------| | Z| No intervention effect | $P\left( d \right) = \ 0\ $ | GEE LME4 INLA MCMC | | X| Simple intervention effect | $\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ 0 \ P\left( d \right) = \ 1\ for\ d\ > \ 0 \ \end{matrix}\ $ | T GEE LME4 INLA MCMC | | L| inverse logistic (sigmoid)| $P\left( d \right) = \ \frac{1}{\left( 1\ + \ exp\left( - d/S \right) \right)}$ | LME4 INLA MCMC | | P| inverse probit (error function) | $P\left( d \right) = 1\ +\ erf\left(\frac{d}{S\sqrt2}\right)$ | LME4 INLA MCMC | | S| piecewise linear | $\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ - S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ P\left( d \right) = \ \left(S/2\ + \ d \right)/S\ for\ - S/2 < d\ < \ S/2\ P\left( d \right) = \ 1\ for\ d\ > \ S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \end{matrix}\ $ | LME4 INLA MCMC | | R| rescaled linear | $P\left( d \right) =\frac{d\ -\ min(d)}{max(d)\ -\ min(d)}$ | LME4 INLA MCMC |

cfunc options P, L and S lead to non-linear models in which the spillover scale parameter (S) must be estimated. This is done by selecting scale_par using a one-dimensional optimisation of the goodness of fit of the model in stats::optimize().

The different values for cfunc lead to the fitted curves shown in Figure 5.1. The light blue shaded part of the plot corresponds to the spillover interval in those cases where this is estimated.

analysisLME4_Z <- CRTanalysis(example, method = "LME4", cfunc = "Z")
summary(analysisLME4_Z)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Model formula:  (1 | cluster) 
## No comparison of arms 
## Estimates:       Control:  0.285  (95% CL:  NA )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1387.609 
## AIC     :  1391.609

analysisLME4_X <- CRTanalysis(example, method = "LME4", cfunc = "X")
summary(analysisLME4_X)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Model formula:  arm + (1 | cluster) 
## No modelling of spillover 
## Estimates:       Control:  0.366  (95% CL:  0.291 0.448 )
##             Intervention:  0.216  (95% CL:  0.162 0.282 )
##                 Efficacy:  0.41  (95% CL:  0.164 0.583 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1379.898 
## AIC     :  1385.898

analysisLME4_P <- CRTanalysis(example, method = "LME4", cfunc = "P")
summary(analysisLME4_P)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.45
## Model formula:  pvar + (1 | cluster) 
## Error function model for spillover
## Estimates:       Control:  0.417  (95% CL:  0.33 0.508 )
##             Intervention:  0.186  (95% CL:  0.136 0.251 )
##                 Efficacy:  0.552  (95% CL:  0.323 0.702 )
## Spillover interval(km):      4.22  (95% CL:  4.2 4.23 )
## % locations contaminated: 91.6  (95% CL:  90.6 92 %)
## Total effect            : 0.229  (95% CL:  0.113 0.344 )
## Ipsilateral Spillover   : 0.0233  (95% CL:  0.0127 0.0322 )
## Contralateral Spillover : 0.0415  (95% CL:  0.0191 0.0646 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1374.215 
## AIC     :  1382.215 including penalty for the spillover scale parameter

analysisLME4_L <- CRTanalysis(example, method = "LME4", cfunc = "L")
summary(analysisLME4_L)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.249
## Model formula:  pvar + (1 | cluster) 
## Sigmoid (logistic) function for spillover
## Estimates:       Control:  0.416  (95% CL:  0.33 0.507 )
##             Intervention:  0.186  (95% CL:  0.135 0.249 )
##                 Efficacy:  0.551  (95% CL:  0.328 0.701 )
## Spillover interval(km):      4.26  (95% CL:  4.24 4.28 )
## % locations contaminated: 92.7  (95% CL:  92.2 93 %)
## Total effect            : 0.229  (95% CL:  0.114 0.342 )
## Ipsilateral Spillover   : 0.0219  (95% CL:  0.012 0.0303 )
## Contralateral Spillover : 0.0387  (95% CL:  0.018 0.0606 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1374.201 
## AIC     :  1382.201 including penalty for the spillover scale parameter

analysisLME4_S <- CRTanalysis(example, method = "LME4", cfunc = "S")
summary(analysisLME4_S)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 1.674
## Model formula:  pvar + (1 | cluster) 
## Piecewise linear function for spillover
## Estimates:       Control:  0.423  (95% CL:  0.334 0.518 )
##             Intervention:  0.185  (95% CL:  0.135 0.247 )
##                 Efficacy:  0.561  (95% CL:  0.344 0.71 )
## Spillover interval(km):      4.1  (95% CL:  4.1 4.11 )
## % locations contaminated: 86.6  (95% CL:  86.6 87.1 %)
## Total effect            : 0.237  (95% CL:  0.122 0.354 )
## Ipsilateral Spillover   : 0.0289  (95% CL:  0.0162 0.0403 )
## Contralateral Spillover : 0.0521  (95% CL:  0.0253 0.0813 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1374.094 
## AIC     :  1382.094 including penalty for the spillover scale parameter

analysisLME4_R <- CRTanalysis(example, method = "LME4", cfunc = "R")
summary(analysisLME4_R)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Signed distance to other arm (km)
## No non-linear parameter. 1
## Model formula:  pvar + (1 | cluster) 
## Rescaled linear function for spillover
## Estimates:       Control:  0.584  (95% CL:  0.383 0.759 )
##             Intervention:  0.116  (95% CL:  0.059 0.216 )
##                 Efficacy:  0.8  (95% CL:  0.466 0.919 )
## Spillover interval(km):      6.64  (95% CL:  6.61 6.65 )
## % locations contaminated: 99.8  (95% CL:  99.8 99.8 %)
## Total effect            : 0.465  (95% CL:  0.181 0.691 )
## Ipsilateral Spillover   : 0.117  (95% CL:  0.0559 0.157 )
## Contralateral Spillover : 0.236  (95% CL:  0.0831 0.367 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1378.711 
## AIC     :  1384.711

p0 <- plotCRT(analysisLME4_Z, map = FALSE)
p1 <- plotCRT(analysisLME4_X, map = FALSE)
p2 <- plotCRT(analysisLME4_P, map = FALSE)
p3 <- plotCRT(analysisLME4_L, map = FALSE)
p4 <- plotCRT(analysisLME4_S, map = FALSE)
p5 <- plotCRT(analysisLME4_R, map = FALSE)
library(cowplot)
plot_grid(p0, p1, p2, p3, p4, p5, labels = c('Z', 'X', 'P', 'L', 'S', 'R'), label_size = 10, ncol = 2)

Fig 5.1 Fitted curves for the example dataset with different options for `cfunc`

The piecewise linear spillover function, cfunc = "S", is only linear on the scale of the linear predictor. When used in a logistic model, as here, the transformation via the inverse of the link function leads to a slightly curved plot (Figure 5.1S). The rescaled linear function, cfunc = "R", is provided as a comparator and for use with distance values other than distance = "nearestDiscord" see below (it should not be used to estimate the spillover interval).

The full set of different cfunc options are available for each of model options "LME4", "INLA", and "MCMC". The performance of all these different models has not yet been thoroughly investigated. The analyses of Multerer et al. (2021b) found that that a model equivalent to method = "MCMC", cfunc = "L" gave estimates of efficacy with low bias, even in simulations with considerable spillover.

Spillover as a function of surround

Spillover can also be analysed by assuming the effect size to be a function of the number of intervened locations in the surroundings of the location Anaya-Izquierdo & Alexander(2021). Several different surround functions are available. These are specified by the distance parameter (Table 5.3).

Table 5.3. Available surround functions

| distance | Description | Details | |----------------|------------------|--------------------------------------------------------------| |nearestDiscord| Distance to nearest discordant location | The default. This is used for analyses by distance (see above) | |hdep| Tukey half-depth | Algorithm of Rousseeuw & Ruts(1996) | |sdep| Simplicial depth| Algorithm of Rousseeuw & Ruts(1996) | |disc| disc | The number of intervened locations within the specified radius (excluding the location itself) as described by Anaya-Izquierdo & Alexander(2021) | |kern| Sum of kernels | The sum of normal kernels |

The compute_distance() function is provided to compute these quantities, so that they can be described, compared, and analysed independently of CRTanalysis(). Note that the values of the surround calculated by compute_distance() are scaled to avoid correlation with the spatial density of the points (see documentation) and so are not equivalent to the quantities reported in the original publications.

Users can also devise other measures of surround or distance, add them to a trial data frame and specify them using distance. CRTanalysis() computes the minimum value for the specified field

examples <- compute_distance(example, distance = "hdep")
ps1 <- plotCRT(examples, distance = "hdep", legend.position = c(0.6, 0.8))
ps2 <- plotCRT(examples, distance = "sdep")
examples <- compute_distance(examples, distance = "disc", scale_par = 0.5)
ps3 <- plotCRT(examples, distance = "disc")
examples <- compute_distance(examples, distance = "kern", scale_par = 0.5)
ps4 <- plotCRT(examples, distance = "kern")
plot_grid(ps1, ps2, ps3, ps4, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)

Fig 5.2 Stacked bar plots for different surrounds

If distance is assigned a value of either hdep, sdep, then cfunc = "R" is used by default and the overall effect size is computed by comparing the fitted values of the model for a surround value of zero with that of the maximum of the surround in the data. If distance = "disc" or distance = "kern" and scale_par is assigned a value, then cfunc = "R" is also used. If cfunc = "E" is specified then an escape function is fitted with the scale parameter estimated in the same way as in the scale parameter in other models (see above Table 5.2).

examples_hdep <- CRTanalysis(examples, method = "LME4", distance = "hdep", cfunc = 'R')
summary(examples_hdep)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Tukey half-depth 
## No non-linear parameter. 1
## Model formula:  pvar + (1 | cluster) 
## Rescaled linear function for spillover
## Estimates:       Control:  0.382  (95% CL:  0.293 0.477 )
##             Intervention:  0.208  (95% CL:  0.151 0.281 )
##                 Efficacy:  0.451  (95% CL:  0.168 0.642 )
## Spillover interval(km):      0.978  (95% CL:  0.976 0.98 )
## % locations contaminated: 55  (95% CL:  55 55 %)
## Total effect            : 0.172  (95% CL:  0.0526 0.292 )
## Ipsilateral Spillover   : 0.0313  (95% CL:  0.0101 0.0511 )
## Contralateral Spillover : 0.0443  (95% CL:  0.0128 0.0781 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1379.89 
## AIC     :  1385.89

ps4 <- plotCRT(examples_hdep,legend.position = c(0.8, 0.8))
examples_sdep <- CRTanalysis(examples, method = "LME4", distance = "sdep", cfunc = 'R')
summary(examples_sdep)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: Simplicial depth 
## No non-linear parameter. 1
## Model formula:  pvar + (1 | cluster) 
## Rescaled linear function for spillover
## Estimates:       Control:  0.394  (95% CL:  0.307 0.488 )
##             Intervention:  0.198  (95% CL:  0.144 0.265 )
##                 Efficacy:  0.496  (95% CL:  0.248 0.663 )
## Spillover interval(km):      0.978  (95% CL:  0.976 0.98 )
## % locations contaminated: 52.4  (95% CL:  52.2 52.4 %)
## Total effect            : 0.195  (95% CL:  0.0808 0.308 )
## Ipsilateral Spillover   : 0.0301  (95% CL:  0.0131 0.046 )
## Contralateral Spillover : 0.0436  (95% CL:  0.0171 0.0713 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1376.417 
## AIC     :  1382.417

ps5 <- plotCRT(examples_sdep)
examples_disc <- CRTanalysis(examples, method = "LME4", distance = "disc", cfunc = 'R', scale_par = 0.15)
summary(examples_disc)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: disc of radius 0.15 km
## Precalculated scale parameter: 0.15
## Model formula:  pvar + (1 | cluster) 
## Rescaled linear function for spillover
## Estimates:       Control:  0.386  (95% CL:  0.312 0.47 )
##             Intervention:  0.2  (95% CL:  0.149 0.262 )
##                 Efficacy:  0.481  (95% CL:  0.27 0.635 )
## Spillover interval(km):      0.978  (95% CL:  0.976 0.98 )
## % locations contaminated: 8.89  (95% CL:  8.89 8.89 %)
## Total effect            : 0.185  (95% CL:  0.0903 0.283 )
## Ipsilateral Spillover   : 0.00456  (95% CL:  0.00237 0.00662 )
## Contralateral Spillover : 0.00574  (95% CL:  0.00266 0.00905 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1374.274 
## AIC     :  1380.274

ps6 <- plotCRT(examples_disc)
examples_kern <- CRTanalysis(examples, method = "LME4", distance = "kern", cfunc = 'R', scale_par = 0.15)
summary(examples_kern)

## 
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method:  LME4 
## Link function:  logit 
## Measure of distance or surround: kern with kernel s.d. 0.15 km
## Precalculated scale parameter: 0.15
## Model formula:  pvar + (1 | cluster) 
## Rescaled linear function for spillover
## Estimates:       Control:  0.406  (95% CL:  0.328 0.49 )
##             Intervention:  0.184  (95% CL:  0.136 0.246 )
##                 Efficacy:  0.545  (95% CL:  0.341 0.684 )
## Spillover interval(km):      0.979  (95% CL:  0.977 0.98 )
## % locations contaminated: 50.8  (95% CL:  50.6 50.9 %)
## Total effect            : 0.22  (95% CL:  0.12 0.321 )
## Ipsilateral Spillover   : 0.011  (95% CL:  0.0065 0.0152 )
## Contralateral Spillover : 0.0135  (95% CL:  0.0069 0.0204 )
## Coefficient of variation:  48.4 %  (95% CL:  36 75.9 )
## deviance:  1369.677 
## AIC     :  1375.677

ps7 <- plotCRT(examples_kern)
plot_grid(ps4, ps5, ps6, ps7, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)

Fig 5.3 Fitted curves for the example dataset with different surrounds

Geostatistical models and mapping results

To carry out a geostatistical analysis with method = "INLA" a prediction mesh is needed. By default a very low resolution mesh is created (creating a high resolution mesh is computationally expensive). To create a 100m INLA mesh for <MyTrial>, use:

mesh <- new_mesh(trial = <MyTrial> , pixel = 0.1)