# CopulaDTA Package Vignette" In CopulaDTA: Copula Based Bivariate Beta-Binomial Model for Diagnostic Test Accuracy Studies

The current statistical procedures implemented in statistical software packages for pooling of diagnostic test accuracy data include hSROC [@hsroc] regression and the bivariate random-effects meta-analysis model (BRMA) ([@Reitsma], [@Arends], [@Chu], [@Rileya]). However, these models do not report the overall mean but rather the mean for a central study with random-effect equal to zero and have difficulties estimating the correlation between sensitivity and specificity when the number of studies in the meta-analysis is small and/or when the between-study variance is relatively large [@Rileyb].

This tutorial on advanced statistical methods for meta-analysis of diagnostic accuracy studies discusses and demonstrates Bayesian modeling using CopulaDTA [@copuladta] package in R [@R] to fit different models to obtain the meta-analytic parameter estimates. The focus is on the joint modelling of sensitivity and specificity using copula based bivariate beta distribution. Essentially,

• we present the Bayesian approach which offers flexibility and ability to perform complex statistical modelling even with small data sets and
• include covariate information, and
• provide an easy to use code.

The statistical methods are illustrated by re-analysing data of two published meta-analyses.

Modelling sensitivity and specificity using the bivariate beta distribution provides marginal as well as study-specific parameter estimates as opposed to using bivariate normal distribution (e.g., in BRMA) which only yields study-specific parameter estimates. Moreover, copula based models offer greater flexibility in modelling different correlation structures in contrast to the normal distribution which allows for only one correlation structure.

## Introduction

In a systematic review of diagnostic test accuracy, the statistical analysis section aims at estimating the average (across studies) sensitivity and specificity of a test and the variability thereof, among other measures. There tends to be a negative correlation between sensitivity and specificity, which postulates the need for correlated data models. The analysis is statistically challenging because the user

• deals with two summary statistics,
• has to account for correlation between sensitivity and specificity,
• has to account for heterogeneity in sensitivity and specificity across the studies and
• should be allowed to incorporate covariates.

Currently, the HSROC [@hsroc] regression or the bivariate random-effects meta-analysis model (BRMA) ([@Reitsma], [@Arends], [@Chu], [@Rileya]) are recommended for pooling of diagnostic test accuracy data. These models fit a bivariate normal distribution which allows for only one correlation structure to the logit transformed sensitivity and specificity. The resulting distribution has no closed form and therefore the mean sensitivity and specificity is only estimated after numerical integration or other approximation methods, an extra step which is rarely taken.

Within the maximum likelihood estimation methods, the BRMA and HSROC models have difficulties in convergence and estimating the correlation parameter when the number of studies in the meta-analysis are small and/or when the between-study variances are relatively large [@Takwoingi]. When the correlation is close to the boundary of its parameter space, the between study variance estimates from the BRMA are upwardly biased as they are inflated to compensate for the range restriction on the correlation parameter [@Rileyb]. This occurs because the maximum likelihood estimator truncates the between-study covariance matrix on the boundary of its parameter space, and this often occurs when the within-study variation is relatively large or the number of studies is small [@Rileya].

The BRMA and HSROC assume that the transformed data is approximately normal with constant variance, however for sensitivity and specificity and proportions in generals, the mean and variance depend on the underlying probability. Therefore, any factor affecting the probability will change the mean and the variance. This implies that the in models where the predictors affect the mean but assume a constant variance will not be adequate.

Joint modelling of study specific sensitivity and specificity using existing or copula based bivariate beta distributions overcomes the above mentioned difficulties. Since both sensitivity and specificity take values in the interval space (0, 1), it is a more natural choice to use a beta distribution to describe their distribution across studies, without the need for any transformation. The beta distribution is conjugate to the binomial distribution and therefore it is easy to integrate out the random-effects analytically giving rise to the beta-binomial marginal distributions. Moreover no further integration is needed to obtain the meta-analytically pooled sensitivity and specificity. Previously, [@Cong] fitted separate beta-binomial models to the number of true positives and the number of false positives. While the model ignores correlation between sensitivity and specificity, [@Cong] reported that the model estimates are comparable to those from the SROC model [@Moses], the predecessor of the HSROC model.

Ignoring the correlation would have negligible influence on the meta-analysis results when the within-study variability is large relative to the between-study variability [@Rileyc]. By utilising the correlation, we allow borrowing strength across sensitivities and specificities resulting in smaller standard errors. The use of copula based mixed models within the frequentist framework for meta-analysis of diagnostic test accuracy was recently introduced by [@Nikolo] who evaluated the joint density numerically.

This tutorial, presents and demonstrates hierarchical mixed models for meta-analysis of diagnostic accuracy studies. In the first level of the hierarchy, given sensitivity and specificity for each study, two binomial distributions are used to describe the variation in the number of true positives and true negatives among the diseased and healthy individuals, respectively. In the second level, we model the unobserved sensitivities and specificities using a bivariate distribution. While hierarchical models are used, the focus of meta-analysis is on the pooled average across studies and rarely on a given study estimate.

The methods are demonstrated using datasets from two previously published meta-analyses:

• on diagnostic accuracy of telomerase in urine as a tumour marker for the diagnosis of primary bladder cancer since it is a problematic dataset that has convergence issues caused by the correlation parameter being estimated to be -1 and has no covariate [@Glas] and
• on the comparison of the sensitivity and specificity of human papillomavirus testing (using the HC2 assay) versus repeat cytology to triage women with minor cytological cervical lesions to detect underlying cervical precancer[@Arbyn]. The second dataset is used to demonstrate meta-regression with one covariate which can be naturally extended to include several covariates.

## Statistical methods for meta-analysis

### Definition of copula function

A bivariate copula function describes the dependence structure between two random variables. Two random variables $X_1$ and $X_2$ are joined by a copula function C if their joint cumulative distribution function can be written as $$F(x_1, x_2) = C(F_1 (x_1), F_2 (x_2 )), -\infty \le x_1, x_2 \le +\infty$$

According to the theorem of [@Skylar], there exists for every bivariate (multivariate in extension) distribution a copula representation C which is unique for continuous random variables. If the joint cumulative distribution function and the two marginals are known, then the copula function can be written as $$C(u, ~v) = F(F_1^{-1} (u), ~F_2^{-1} (v)),~ 0 \le~ u, ~v ~\le~ 1$$

A 2-dimensional copula is in fact simply a 2-dimensional cumulative function restricted to the unit square with standard uniform marginals. A comprehensive overview of copulas and their mathematical properties can be found in [@Nelsen]. To obtain the joint probability density, the joint cumulative distribution $F(x_1, x_2)$ should be differentiated to yield $$f(x_1, ~x_2) = f_1(x_1) ~f_2(x_2 ) ~c(F_1(x_1), ~F_2 (x_2))$$ where $f_1$ and $f_2$ denote the marginal density functions and c the copula density function corresponding to the copula cumulative distribution function C. Therefore from Equation~\ref{eq:3}, a bivariate probability density can be expressed using the marginal and the copula density, given that the copula function is absolutely continuous and twice differentiable.

When the functional form of the marginal and the joint densities are known, the copula density can be derived as follows $$c(F_1(x_1), ~F_2(x_2)) = \frac{f(x_1, ~x_2)}{f_1 (x_1 )~ f_2 (x_2 )}$$

While our interest does not lie in finding the copula function, the equations above serve to show how one can move from the copula function to the bivariate density or vice-versa, given that the marginal densities are known. The decompositions allow for constructions of other and possible better models for the variables than would be possible if we limited ourselves to only existing standard bivariate distributions.

We finish this section by mentioning an important implication when Sklar's theorem is extended to a meta-regression setting with covariates. According to [@Patton], it is important that the conditioning variable remains the same for both marginal distributions and the copula, as otherwise the joint distribution might not be properly defined. This implies that covariate information should be introduced in both the marginals and the association parameters of the model.

### The hierarchical model

Since there are two sources of heterogeneity in the data, the within- and between-study variability, the parameters involved in a meta-analysis of diagnostic accuracy studies vary at two levels. For each study $i$, $i = 1, ..., n$, let $Y_{i}~=~(Y_{i1},~ Y_{i2})$ denote the true positives and true negatives, $N_{i}~=~( N_{i1},~ N_{i2})$ the diseased and healthy individuals respectively, and $\pi_{i}~ =~ (\pi_{i1},~ \pi_{i2})$ represent the unobserved' sensitivity and specificity respectively.

Given study-specific sensitivity and specificity, two separate binomial distributions describe the distribution of true positives and true negatives among the diseased and the healthy individuals as follows $$Y_{ij}~ |~ \pi_{ij}, ~\textbf{x}i~ \sim~ bin(\pi{ij},~ N_{ij}), i~=~1, ~\dots ~n, ~j~=~1, ~2$$

where $\textbf{x}i$ generically denotes one or more covariates, possibly affecting $\pi{ij}$. Equation ~\ref{eq:5} forms the higher level of the hierarchy and models the within-study variability. The second level of the hierarchy aims to model the between study variability of sensitivity and specificity while accounting for the inherent negative correlation thereof, with a bivariate distribution as follows $$\begin{pmatrix} g(\pi_{i1})\ g(\pi_{i2}) \end{pmatrix} \sim f(g(\pi_{i1}),~ g(\pi_{i2}))~ =~ f(g(\pi_{i1})) ~f(g(\pi_{i2})) ~c(F_1(g(\pi_{i1})),~ F_2(g(\pi_{i2}))),$$
where $g(.)$ denotes a transformation that might be used to modify the (0, 1) range to the whole real line. While it is critical to ensure that the studies included in the meta-analysis satisfy the specified entry criterion, there are study specific characteristics like different test thresholds and other unobserved differences that give rise to the second source of variability, the between-study variability. It is indeed the difference in the test thresholds between the studies that gives rise to the correlation between sensitivity and specificity. Including study level covariates allows us to model part of the between-study variability. The covariate information can and should be used to model the mean as well as the correlation between sensitivity and specificity.

In the next section we give more details on different bivariate distributions $f(g(\pi_{i1}),~g(\pi_{i2}))$ constructed using the logit or identity link function $g(.)$, different marginal densities and/or different copula densities $c$. We discuss their implications and demonstrate their application in meta-analysis of diagnostic accuracy studies. An overview of suitable parametric families of copula for mixed models for diagnostic test accuracy studies was recently given by Nikoloupolous (2015). Here, we consider five copula functions which can model negative correlation.

#### Bivariate Gaussian copula

Given the density and the distribution function of the univariate and bivariate standard normal distribution with correlation parameter $\rho \in (-1, 1)$, the bivariate Gaussian copula function and density is expressed [@Meyer] as $$C(u, ~v, ~\rho) = \Phi_2(\Phi^{-1}(u),~ \Phi^{-1}(v),~ \rho),$$ $$c(u, ~v, ~\rho) =~ \frac{1}{\sqrt{1~-~\rho^2}} ~exp\bigg(\frac{2~\rho~\Phi^{-1}(u) ~\Phi^{-1}(v) - \rho^2~ (\Phi^{-1}(u)^2 + \Phi^{-1}(v)^2)}{2~(1 - \rho^2)}\bigg )$$

The logit transformation is often used in binary logistic regression to relate the probability of "success" (coded as 1, failure as 0) of the binary response variable with the linear predictor model that theoretically can take values over the whole real line. In diagnostic test accuracy studies, the unobserved' sensitivities and specificities can range from 0 to 1 whereas their logits = $log ( \frac{\pi_{ij}}{1~-~ \pi_{ij}})$ can take any real value allowing to use the normal distribution as follows $$logit (\pi_{ij}) ~\sim~ N(\mu_j, ~\sigma_j) ~<=> ~logit (\pi_{ij}) ~=~ \mu_j ~+~ \varepsilon_{ij},$$ where, $\mu_j$ is a vector of the mean sensitivity and specificity for a study with zero random effects, and $\varepsilon_{i}$ is a vector of random effects associated with study $i$. Now $u$ is the normal distribution function of $logit(\pi_{i1}$) with parameters $\mu_1$ and $\sigma_1$, \textit{v} is the normal distribution function of $logit(\pi_{i2})$ with parameters $\mu_2$ and $\sigma_2$, $\Phi_2$ is the distribution function of a bivariate standard normal distribution with correlation parameter $\rho \in (-1, ~1)$ and $\Phi^{-1}$ is the quantile of the standard normal distribution. In terms of $\rho$, Kendall's tau is expressed as ($\frac{2}{\pi}$)arcsin$(\rho)$.

With simple algebra the copula density with normal marginal distributions simplifies to $$c(u, ~v, ~\rho) = \frac{1}{\sqrt{1 - \rho^2}} ~exp\bigg(\frac{1}{2 ~(1 ~-~ \rho^2)}~ \bigg( \frac{2~\rho~(x - \mu_1)~(y - \mu_2)}{\sigma_1 ~\sigma_2} ~-~ \rho^2 ~\bigg (\frac{{x ~-~ \mu_1}^2}{\sigma_1} ~+~ \frac{{y ~-~ \mu_2}^2}{\sigma_2}\bigg)\bigg ) \bigg ).$$

The product of the copula density above, the normal marginal of $logit(\pi_{i1}$) and $logit(\pi_{i2}$) form a bivariate normal distribution which characterize the model by [@Reitsma], [@Arends], [@Chu], and [@Rileyb], the so-called bivariate random-effects meta-analysis (BRMA) model, recommended as the appropriate method for meta-analysis of diagnostic accuracy studies. Study level covariate information explaining heterogeneity is introduced through the parameters of the marginal and the copula as follows $$\boldsymbol{\mu}j = \textbf{X}_j\textbf{B}_j^{\top}.$$


## The intercept only model

The CopulaDTA package has five different correlation structures that result to five different bivariate beta-binomial distributions to fit to the data. The correlation structure is specified by indicating copula ="gauss" or "fgm" or "c90" or "270" or "frank" in the cdtamodel function.

gauss.1 <- cdtamodel("gauss")


The code for the fitted model and the model options are displayed as follows

print(gauss.1)
str(gauss.1)


The Gaussian copula bivariate beta-binomial distribution is fitted to the telomerase data with the following code

fitgauss.1 <- fit(
gauss.1,
data = telomerase,
SID = "ID",
iter = 28000,
warmup = 1000,
thin = 30,
seed = 3)


By default, chains = 3 and cores = 3 and need not be specified unless otherwise. From the code above, 28000 samples are drawn from each of the 3 chains, the first 1000 samples are discarded and thereafter every 30$^{th}$ draw kept such that each chain has 900 post-warm-up draws making a total of 2700 post-warm-up draws. The seed value, seed = 3, specifies a random number generator to allow reproducibility of the results and cores = 3 allows for parallel-processing of the chains by using 3 cores, one core for each chain. They were no initial values specified and in that case, the program randomly generates random values satisfying the parameter constraints.

The trace plots below show satisfactory mixing of the chains and convergence.

traceplot(fitgauss.1)


Next, obtain the model summary estimates as follows

print(fitgauss.1, digits = 4)


From the output above, n_eff and Rhat both confirm proper mixing of the chains with little autocorrelation. The meta-analytic sensitivity MUse[1] and specificity MUsp[1] is 0.7568 [0.6904, 0.8166] and 0.7983 [0.6171, 0.9064] respectively. The between-study variability in sensitivity and specificity is 0.0062 [0, 0.0195] and 0.0048 [0.0136, 0.1220], respectively. The Kendall's tau correlation between sensitivity and specificity is estimated to be -0.8202 [-0.9861, -0.3333].

The command below produces a series of forest plots.

plot(fitgauss.1)


$G1 is a plot of the study-specific sensitivity and specificity (magenta points) and their corresponding 95 \% exact confidence intervals (black lines). $G2 is a plot of the posterior study-specific (blue points) and marginal(blue diamond) sensitivity and specificity and their corresponding 95 \% credible intervals (black lines).

$G3 is a plot of the posterior study-specific (blues stars), and marginal(blue diamond) sensitivity and specificity and their corresponding 95 \% credible intervals (black lines). The study-specific sensitivity and specificity (magenta points) and their corresponding 95 \% exact confidence intervals (thick grey lines) are also presented. As observed in plots above, the posterior study-specific sensitivity and specificity are less extreme and variable than the "observed" study-specific sensitivity and specificity. In other words, there is "shrinkage" towards the overall mean sensitivity and specificity as studies borrow strength from each other in the following manner: the posterior study-specific estimates depends on the global estimate and thus also on all other the studies. The other four copula based bivariate beta distributions are fitted as follows  {r, eval=FALSE} c90.1 <- cdtamodel(copula = "c90") fitc90.1 <- fit(c90.1, data=telomerase, SID="ID", iter=28000, warmup=1000, thin=30, seed=718117096) c270.1 <- cdtamodel(copula = "c270") fitc270.1 <- fit(c270.1, data=telomerase, SID="ID", iter=19000, warmup=1000, thin=20, seed=3) fgm.1 <- cdtamodel(copula = "fgm") fitfgm.1 <- fit(fgm.1, data=telomerase, SID="ID", iter=19000, warmup=1000, thin=20, seed=3) frank.1 <- cdtamodel(copula = "frank") fitfrank.1 <- fit(frank.1, data=telomerase, SID="ID", iter=19000, warmup=1000, thin=20, seed=1959300748) For comparison purpose, the current recommended model; the BRMA, which uses normal marginals is also fitted to the data though it is not part of the CopulaDTA package. The model is first expressed in Stan modelling language in the code below and is stored within R environment as character string named BRMA1. r BRMA1 <- " data{ int<lower = 0> N; int<lower = 0> tp[N]; int<lower = 0> dis[N]; int<lower = 0> tn[N]; int<lower = 0> nondis[N]; } parameters{ real etarho; vector[2] mul; vector<lower = 0>[2] sigma; vector[2] logitp[N]; vector[2] logitphat[N]; } transformed parameters{ vector[N] p[2]; vector[N] phat[2]; real MU[2]; vector[2] mu; real rho; real ktau; matrix[2,2] Sigma; rho = tanh(etarho); ktau = (2/pi())*asin(rho); for (a in 1:2){ for (b in 1:N){ p[a][b] = inv_logit(logitp[b][a]); phat[a][b] = inv_logit(logitphat[b][a]); } mu[a] = inv_logit(mul[a]); } MU[1] = mean(phat[1]); MU[2] = mean(phat[2]); Sigma[1, 1] = sigma[1]^2; Sigma[1, 2] = sigma[1]*sigma[2]*rho; Sigma[2, 1] = sigma[1]*sigma[2]*rho; Sigma[2, 2] = sigma[2]^2; } model{ etarho ~ normal(0, 10); mul ~ normal(0, 10); sigma ~ cauchy(0, 2.5); for (i in 1:N){ logitp[i] ~ multi_normal(mul, Sigma); logitphat[i] ~ multi_normal(mul, Sigma); } tp ~ binomial(dis,p[1]); tn ~ binomial(nondis, p[2]); } generated quantities{ vector[N*2] loglik; for (i in 1:N){ loglik[i] = binomial_lpmf(tp[i]| dis[i], p[1][i]); } for (i in (N+1):(2*N)){ loglik[i] = binomial_lpmf(tn[i-N]| nondis[i-N], p[2][i-N]); } } "  Next, prepare the data by creating a list as follows datalist = list( tp = telomerase$TP,
dis = telomerase$TP + telomerase$FN,
tn = telomerase$TN, nondis = telomerase$TN + telomerase$FP, N = 10)  In the data block the dimensions and names of variables in the dataset are specified, here Ns indicate the number of studies in the dataset. The parameters block introduces the unknown parameters to be estimated. These are etarho; a scalar representing the Fisher's transformed form of the association parameter$\rho$, mul;a$2 \times 1$vector representing the mean of sensitivity and specificity on the logit scale for a central study where the random-effect is zero, sigma; a$2 \times 1$vector representing the between study standard deviation of sensitivity and specificity on the logit scale, logitp; a$Ns \times 2$array of study-specific sensitivity in the first column and specificity in the second column on logit scale, and logitphat; a$Ns \times 2$array of predicted sensitivity in the first column and predicted specificity in the second column on logit scale. The parameters are further transformed in the transformed parameters block. Here, p is a$2 \times Ns$array of sensitivity in the first column and specificity in the second column after inverse logit transformation of logitp, and phat is a$2 \times Ns$array of predicted sensitivity in the first column and predicted specificity in the second column after inverse logit transformation of logitphat to be used in computing the meta-analytic sensitivity and specificity. mu is a$2 \times 1$vector representing the mean of sensitivity and specificity for a certain study with a random effect equal to 0, MU is a$2 \times 1$vector containing the meta-analytic sensitivity and specificity, Sigma; a$2 $\times$ 2$matrix representing the variance-covarince matrix of sensitivity and specificity on the logit scale, rho and ktau are scalars representing the Pearson's and Kendall's tau correlation respectively. The prior distributions for the all parameters and data likelihood are defined in the model block. Finally, in the generated quantities block, loglik is a$(2Ns) \times 1$vector of the log likelihood needed to compute the WAIC. Next, call the function stan from the rstan package to translate the code into C++, compile the code and draw samples from the posterior distribution as follows  {r, eval=FALSE} brma.1 <- stan(model_code = BRMA1, data = datalist, chains = 3, iter = 5000, warmup = 1000, thin = 10, seed = 3, cores = 3) The parameter estimates are extracted and the chain convergence and autocorrelation examined further with the following code r print(brma.1, pars=c('MU', 'mu', 'rho', "Sigma"), digits=4, prob=c(0.025, 0.5, 0.975))  The meta-analytic sensitivity (MU[1]) and specificity (MU[2]) and 95\% credible intervals are 0.7525[0.6323, 0.8415] and 0.7908[0.5273, 0.9539] respectively. This differs from what the authors published (0.75[0.66, 0.74] and 0.86[0.71, 0.94]) in two ways. The authors fitted the standard bivariate normal distribution to the logit transformed sensitivity and specificity values across the studies allowing for heterogeneity between the studies and disregarded the higher level of the hierarchical model. Because of this the authors had to use a continuity correction of 0.5 since the seventh study had "observed" specificity equal to 1, a problem not encountered in the hierarchical model. Secondly the authors do not report the meta-analytic values but rather report the mean sensitivity (mu[1]) and specificity (mu[2]) for a particular, hypothetical study with random-effect equal to zero, which in our case is 0.7668[0.6869, 0.8369] and 0.8937[0.6943, 0.9825] respectively and is comparable to what the authors reported. This discrepancy between MU and mu will indeed increase with increase in the between study variability. Here, (MU[1]) and (mu[1]) are similar because the between-study variability in sensitivity in the logit scale (Sigma[1,1]) is small 0.333 [0.0579, 0.9851]. (MU[2]) is subtantially smaller than (mu[2]) as a result of the subtantial between-study heterogeneity in specificity in the logit scale (Sigma[2,2]) = 5.6827 [1.4720, 16.9031]. The figure below shows satisfactory chain mixing with little autocorrelation for most of the fitted models except the Clayton copula models. f1.1 <- traceplot(fitgauss.1) f1.2 <- traceplot(fitc90.1) f1.3 <- traceplot(fitc270.1) f1.4 <- traceplot(fitfgm.1) f1.5 <- traceplot(fitfrank.1) #draws <- as.array(brma.1, pars=c('MU')) f1.6 <- rstan::stan_trace(brma.1, pars=c('MU'))  library(Rmisc) multiplot(f1.1, f1.2, f1.3, f1.4, f1.5, f1.6, cols=2)  The mean sensitivity and specificity as estimated by all the fitted distributions are in the table below. brma.summary1 <- data.frame(Parameter=c("Sensitivity", "Specificity", "Correlation", "Var(lSens)", "Sigma[1,2]", "Sigma[2,1]","Var(lSpec)"), summary(brma.1, pars=c('MU', 'ktau', 'Sigma'))$summary[,c(1, 4, 6, 8:10)])

brma.summary1 <- brma.summary1[-c(5,6),]

names(brma.summary1)[2:5] <- c("Mean", "Lower", "Median", "Upper")

library(loo)

Table1 <- cbind(Model=rep(c("Gaussian", "C90", "C270", "FGM", "Frank", "BRMA"), each=5),
rbind(summary(fitgauss.1)$Summary, summary(fitc90.1)$Summary,
summary(fitc270.1)$Summary, summary(fitfgm.1)$Summary,
summary(fitfrank.1)$Summary, brma.summary1), WAIC = t(data.frame(rep(c(summary(fitgauss.1)$WAIC[1],
summary(fitc90.1)$WAIC[1], summary(fitc270.1)$WAIC[1],
summary(fitfgm.1)$WAIC[1], summary(fitfrank.1)$WAIC[1],
loo::waic(extract_log_lik(brma.1, parameter_name="loglik"))[3]), each=5))))

rownames(Table1) <- NULL

print(Table1, digits=4)


The results are presented graphically as follows

g1 <- ggplot(Table1[Table1$Parameter %in% c("Sensitivity", "Specificity"),], aes(x=Model, y= Mean)) + geom_point(size=3) + theme_bw() + coord_flip() + facet_grid(~ Parameter, switch="x") + geom_errorbar(aes(ymin=Lower, ymax=Upper), size=.75, width=0.15) + theme(axis.text.x = element_text(size=13, colour='black'), axis.text.y = element_text(size=13, colour='black'), axis.title.x = element_text(size=13, colour='black'), strip.text = element_text(size = 13, colour='black'), axis.title.y= element_text(size=13, angle=0, colour='black'), strip.text.y = element_text(size = 13, colour='black'), strip.text.x = element_text(size = 13, colour='black'), plot.background = element_rect(fill = "white", colour='white'), panel.grid.major = element_blank(), panel.background = element_blank(), strip.background = element_blank(), axis.line.x = element_line(color = 'black'), axis.line.y = element_line(color = 'black')) + scale_y_continuous(name="Mean [95% equal-tailed credible intervals]", limits=c(0.45,1.1), breaks=c(0.5, 0.75, 1), labels = c("0.5", "0.75", "1")) g1  ### Model comparison Table1 above showed that the correlation as estimated by the BRMA model and the Gaussian copula bivariate beta are more extreme but comparable to the estimates from the Frank,$90^{\circ}$- and$270^{\circ}$- Clayton copula. On the other extreme is the estimate from the model FGM copula bivariate beta and this is due to the constraints on the association parameter in the FGM copula where values lie within |2/9|. In the figure above g1, the marginal mean sensitivity and specificity from the five bivariate beta distributions are comparable with subtle differences in the 95 percent credible intervals despite differences in the correlation structure. Glas et al. (2003) and Riley et al. (2007a) estimated the Pearson's correlation parameter in the BRMA model$\rho$as -1 within the frequentist framework. Using maximum likelihood estimation, Riley et al. (2007b) showed that the between-study correlation from the BRMA is often estimated as +/-1. Without estimation difficulties, the table above shows an estimated Pearson's correlation of -0.8224[-0.9824, -0.3804]. This is because Bayesian methods are not influenced by sample size and therefore able to handle cases of small sample sizes with less issues. Essentially, all the six models are equivalent in the first level of hierarchy and differ in specifying the prior distributions for the "study-specific" sensitivity and specificity. As thus, the models should have the same number of parameters in which case it makes sense then to compare the log predictive densities. Upon inspection, the log predictive densities from the five copula-based models are practically equivalent (min=-38.77, max=-37.89) but the effective number of parameters differed a bit (min=7.25, max=9.92). The BRMA had 6 effective parameters but a lower log predictive density of -43.4. The last column of table above indicates that the BRMA fits the data best based on the WAIC despite its lower predictive density. ## Meta-regression The ascus dataset has Test} as a covariate. The covariate is used as it is of interest to study its effect on the joint distribution of sensitivity and specificity (including the correlation). The following code fits the copula based bivariate beta-binomial distribution to the data ascus data. fgm.2 <- cdtamodel(copula = "fgm", modelargs = list(formula.se = StudyID ~ Test + 0)) fitfgm.2 <- fit(fgm.2, data=ascus, SID="StudyID", iter=19000, warmup=1000, thin=20, seed=3) gauss.2 <- cdtamodel(copula = "gauss", modelargs = list(formula.se = StudyID ~ Test + 0)) fitgauss.2 <- fit(gauss.2, data=ascus, SID="StudyID", iter=19000, warmup=1000, thin=20, seed=3) c90.2 <- cdtamodel(copula = "c90", modelargs = list(formula.se = StudyID ~ Test + 0)) fitc90.2 <- fit(c90.2, data=ascus, SID="StudyID", iter=19000, warmup=1000, thin=20, seed=3) c270.2 <- cdtamodel(copula = "c270", modelargs = list(formula.se = StudyID ~ Test + 0)) fitc270.2 <- fit(c270.2, data=ascus, SID="StudyID", iter=19000, warmup=1000, thin=20, seed=3) frank.2 <- cdtamodel(copula = "frank", modelargs = list(formula.se = StudyID ~ Test + 0)) fitfrank.2 <- fit(frank.2, data=ascus, SID="StudyID", iter=19000, warmup=1000, thin=20, seed=3)  The BRMA is fitted by the code below BRMA2 <- " data{ int<lower=0> N; int<lower=0> tp[N]; int<lower=0> dis[N]; int<lower=0> tn[N]; int<lower=0> nondis[N]; matrix<lower=0>[N,2] x; } parameters{ real etarho; // g(rho) matrix[2,2] betamul; vector<lower=0>[2] sigma; vector[2] logitp[N]; vector[2] logitphat[N]; } transformed parameters{ row_vector[2] mul_i[N]; vector[N] p[2]; vector[N] phat[2]; //expit transformation matrix[2,2] mu; real rho; //pearsons correlation real ktau; //Kendall's tau matrix[2,2] Sigma; //Variance-covariance matrix matrix[2,2] MU; row_vector[2] RR; rho = tanh(etarho); //fisher z back transformation ktau = (2/pi())*asin(rho); for (j in 1:2){ for (k in 1:2){ mu[j, k] = inv_logit(betamul[j, k]); } } for (a in 1:N){ mul_i[a] = row(x*betamul,a); for (b in 1:2){ p[b][a] = inv_logit(logitp[a][b]); phat[b][a] = inv_logit(logitphat[a][b]); } } MU[1,1] = sum(col(x, 1).*phat[1])/sum(col(x, 1)); //sensitivity HC2 MU[1,2] = sum(col(x, 1).*phat[2])/sum(col(x, 1)); //specificity HC2 MU[2,1] = sum(col(x, 2).*phat[1])/sum(col(x, 2)); //sensitivity RepC MU[2,2] = sum(col(x, 2).*phat[2])/sum(col(x, 2)); //specificity RepC RR = row(MU, 2)./row(MU, 1); //RepC vs HC2 Sigma[1, 1] = sigma[1]^2; Sigma[1, 2] = sigma[1]*sigma[2]*rho; Sigma[2, 1] = sigma[1]*sigma[2]*rho; Sigma[2, 2] = sigma[2]^2; } model{ #Priors etarho ~ normal(0, 10); sigma ~ cauchy(0, 2.5); for (i in 1:2){ betamul[i] ~ normal(0, 10); } for (l in 1:N){ logitp[l] ~ multi_normal(mul_i[l], Sigma); logitphat[l] ~ multi_normal(mul_i[l], Sigma); } //Likelihood tp ~ binomial(dis,p[1]); tn ~ binomial(nondis, p[2]); } generated quantities{ vector[N*2] loglik; for (i in 1:N){ loglik[i] = binomial_lpmf(tp[i]| dis[i], p[1][i]); } for (i in (N+1):(2*N)){ loglik[i] = binomial_lpmf(tn[i-N]| nondis[i-N], p[2][i-N]); } } " datalist2 <- list( tp = ascus$TP,
dis = ascus$TP + ascus$FN,
tn = ascus$TN, nondis = ascus$TN + ascus$FP, N = 20, x = structure(.Data=c(2-as.numeric(as.factor(ascus$Test)),
rev(2-as.numeric(as.factor(ascus$Test)))), .Dim=c(20, 2))) brma.2 <- stan(model_code = BRMA2, data = datalist2, chains = 3, iter = 5000, warmup = 1000, thin = 10, seed = 3, cores=3)  The figure below shows the trace plots for all the six models fitted to the ascus data where all parameters, including the correlation parameter(except the BRMA) are modeled as a function of the covariate. There is proper chains mixing and convergence except for the case of the Clayton copula based bivariate beta. f2.1 <- traceplot(fitgauss.2) f2.2 <- traceplot(fitc90.2) f2.3 <- traceplot(fitc270.2) f2.4 <- traceplot(fitfgm.2) f2.5 <- traceplot(fitfrank.2) #draws <- as.array(brma.2, pars=c('MU')) f2.6 <- rstan::stan_trace(brma.2, pars=c('MU'))  multiplot(f2.1, f2.2, f2.3, f2.4, f2.5, f2.6, cols=2)  The n\_eff in table below indicate substantial autocorrelation in sampling the correlation parameters except in the Gaussian',FGM' and Frank' models. From the copula based bivariate beta distributions, it is apparent that the correlation between sensitivity and specificity in HC2 and repeat cytology is different. brma.summary2 <- data.frame(Parameter=c(rep(c("Sensitivity", "Specificity"), times=2), "RSE", "RSP", "Var(lSens)", "Cov(lSens_Spec)", "Cov(lSens_Spec)", "Var(lSpec)", "Correlation"), Test=c(rep(c("HC2", "Repc"), times=2), rep("Repc", 2), rep("Both", 5)), summary(brma.2, pars=c('MU', "RR", 'Sigma', 'ktau'))$summary[,c(1, 4, 6, 8:10)])

names(brma.summary2)[3:6] <- c("Mean", "Lower", "Median", "Upper")

Table2 <- cbind(Model=rep(c("Gaussian", "C90", "C270", "FGM", "Frank"), each=14),
Test=rep(c("HC2", "Repc"), length.out=70),
rbind(summary(fitgauss.2)$Summary, summary(fitc90.2)$Summary,
summary(fitc270.2)$Summary, summary(fitfgm.2)$Summary,
summary(fitfrank.2)$Summary), WAIC = t(data.frame(rep(c(summary(fitgauss.2)$WAIC[1],
summary(fitc90.2)$WAIC[1], summary(fitc270.2)$WAIC[1],
summary(fitfgm.2)$WAIC[1], summary(fitfrank.2)$WAIC[1]), each=14))))

Table2$Parameter <- rep(rep(c("Sensitivity", "Specificity", "RSE", "RSP", "Correlation", "Var(Sens)", "Var(Spec)"), each=2), times=5) Table2 <- rbind(Table2, cbind(Model=rep("BRMA", 11), brma.summary2, WAIC=t(data.frame(rep(loo::waic(extract_log_lik(brma.2, parameter_name="loglik"))[3], 11))))) rownames(Table2) <- NULL print(Table2[Table2$Parameter=="Correlation",], digits=4)


The Clayton90 model has the lowest WAIC even though sampling from the posterior distribution was difficult as seen in their trace plots in figure abover and the n_eff and Rhat in the table above. The difficulty in sampling from the posterior could be signalling over-parameterisation of the correlation structure. It would thus be interesting to re-fit the models using only one correlation parameter and compare the models. WAIC is known to fail in certain settings and this examples shows that it is crucial to check the adequacy of the fit and plausibility of the model and not blindly rely on an information criterion to select the best fit to the data.

From the posterior relative sensitivity and specificity plotted below, all the models that converged generally agree that repeat cytology was less sensitive than HC2 without significant loss in specificity.

g2 <- ggplot(Table2[Table2$Parameter %in% c("RSE", "RSP") & (Table2$Test == "Repc") ,],
aes(x=Model, y= Mean, ymax=1.5)) +
geom_point(size=3) +
theme_bw() +
coord_flip() +
facet_grid(~ Parameter, switch="x") +
geom_errorbar(aes(ymin=Lower, ymax=Upper),size=.75, width=0.15) +
geom_hline(aes(yintercept = 1),
linetype = "longdash",
colour="blue") +
theme(axis.text.x = element_text(size=13, colour='black'),
axis.text.y = element_text(size=13, colour='black'),
axis.title.x = element_text(size=13, colour='black'),
strip.text = element_text(size = 13, colour='black'),
axis.title.y= element_text(size=13, angle=0, colour='black'),
strip.text.y = element_text(size = 13, colour='black'),
strip.text.x = element_text(size = 13, colour='black'),
panel.grid.major = element_blank(),
panel.background = element_blank(),
strip.background = element_blank(),
plot.background = element_rect(fill = "white", colour='white'),
axis.line.x = element_line(color = 'black'),
axis.line.y = element_line(color = 'black')) +
scale_y_continuous(name="Mean [95% equal-tailed credible intervals]",
limits=c(0.5, 2))
g2


## Discussion

Copula-based models offer great flexibility and ease but their use is not without caution. While the copulas used in this paper are attractive as they are mathematically tractable, [@Mikosch] and [@Genest] noted that it might be difficult to estimate copulas from data. Furthermore, the concepts behind copula models is slightly more complex and therefore require statistical expertise to understand and program them as they are not yet available as standard procedure/programs in statistical software.

In this paper, several advanced statistical models for meta-analysis of diagnostic accuracy studies were briefly discussed. The use of the R package CopulaDTA within the flexible Stan` interface was demonstrated and shows how complex models can be implemented in a convenient way.

In most practical situations, the marginal mean structure is of primary interest and the correlation structure is treated a nuisance making the choice of copula less critical. Nonetheless, an appropriate correlation structure is critical in the interpretation of the random variation in the data as well as obtaining valid model-based inference for the mean structure.

When the model for the mean is correct but the true distribution is misspecified, the estimates of the model parameters will be consistent but the standard errors will be incorrect Agresti (2002). Nonetheless, the bivariate beta distribution has the advantage to allow direct joint modelling of sensitivity and specificity, without the need of any transformation, and consequently providing estimates with the appropriate meta-analytic interpretation but with the disadvantage of being more computationally intensive for some of the copula functions.

[@Leeflang] showed that the sensitivity and specificity often vary with disease prevalence. The models presented above can easily be extended and implemented to jointly model prevalence, sensitivity and specificity using tri-variate copulas.

There were some differences between the models in estimating the meta-analytic sensitivity and specificity and the correlation. Therefore, further research is necessary to investigate the effect of certain parameters, such as the number of studies, sample sizes and misspecification of the joint distribution on the meta-analytic estimates.

## Conclusion

The proposed Bayesian joint model using copulas to construct bivariate beta distributions, provides estimates with both the appropriate marginal as well as conditional interpretation, as opposed to the typical BRMA model which estimates sensitivity and specificity for specific studies with a particular value for the random-effects. Furthermore, the models do not have estimation difficulties with small sample sizes or large between-study variance because: i the between-study variances are not constant but depends on the underlying means and ii Bayesian methods are less influenced by small samples sizes.

The fitted models generally agree that the mean specificity was slightly lower than what Glas et al. (2003) reported and based on this we conclude that telomerase was not sensitive and specific enough to be recommended for daily use.

In the ASCUS triage data, conclusion based on the fitted models is in line with what the authors conclude: that HC2 was considerably more sensitive but sligthly and non-significantly less specific than repeat cytology to triage women with an equivocal Pap smear to diagnose cervical precancer.

While the BRMA had the lowest WAIC for both datasets, we still recommend modelling of sensitivity and specificity using bivariate beta distributions as they easily and directly provide meta-analytic estimates.