Suppose we are able to summarize the current state of scientific knowledge for the mean/proportion of each of several endpoints for a particular treatment using a distribution. The distribution for each endpoint may come from the elicitation of prior information from experts, some initial dataset or some combination of different sources of information. In many cases it will be difficult to arrive at a joint distribution. Only the marginal distribution of each endpoint will be calculated easily. This is particularly true when using externally published data where only marginal effects are known, and no patient level data is available. In general assuming independence for the distribution of the mean for each endpoint is not appropriate, as we would expect correlations in the distribution given that many endpoints are correlated. The method described here may be used to simulate from the approximate joint distribution given the marginal distribution and an individual level data set. The correlation structure within the individual level data is used to impute the joint distribution. The method also provides a way to simulate virtual trial data based on the marginal.
As an example, we simulate the following simple dataset with a continuous and two binary variables.
library(tboot) set.seed(2020) quant1 <- rnorm(200) + 1 bin1 <- ifelse( (.5*quant1 + .5*rnorm(200)) > .5, 1, 0) bin2 <- ifelse( (.5*quant1 + .5*rnorm(200)) > .5, 1, 0) simData <- data.frame(quant1, bin1, bin2) head(simData)
First, we create a list with simulations from the marginal distribution of each variable for two different treatments (active treatment and placebo).
marginal_active <- list(quant1=rnorm(5000, mean=.5, sd=.2), bin1=rbeta(5000, shape1 = 50,shape2=50), bin2=rbeta(5000, shape1 = 60,shape2=40)) marginal_pbo <- list(quant1=rnorm(5000, mean=.2, sd=.2), bin1=rbeta(5000, shape1 = 20,shape2=80), bin2=rbeta(5000, shape1 = 30,shape2=70))
We next need to use 'tweights_bmr' to calculate the correlation matrix from the data and get set for marginal reconstruction. The calculation uses a call to the 'tweights' function.
bmr_active <- tweights_bmr(dataset = simData, marginal = marginal_active) bmr_pbo <- tweights_bmr(dataset = simData, marginal = marginal_pbo)
To simulate from the posterior we use 'post_bmr':
samples <- rbind(data.frame(trt="active", post_bmr(nsims=1e3, bmr_active)), data.frame(trt="pbo", post_bmr(nsims=1e3, bmr_pbo))) head(samples)
The posterior samples show a correlations structure.
pairs(samples[,-1], col=ifelse(samples=="active","red","blue"), pch='.', cex=.5)
Marginally the posterior samples are equivalent to the simulations used as input (i.e., in the 'marginal' parameter).
library(ggplot2) pltdta=do.call(rbind, lapply(c("quant1","bin1", "bin2"), function(nm) { rbind(data.frame(type="BMR", var=nm, trt=samples$trt, val=samples[[nm]]), data.frame(type="marginal", var=nm, trt="active", val=marginal_active[[nm]]), data.frame(type="marginal", var=nm, trt="pbo", val=marginal_pbo[[nm]])) })) ggplot(pltdta, aes(fill=type, x=val)) + geom_density(alpha=.3) + facet_grid(var~trt, scales = "free")
To simulate a random trial dataset using the parameters from a single draw of 'post_bmr' we use the tboot_bmr function. For example, to simulate 100 patients on active treatment:
active_sample=tboot_bmr(nrow=100, weights_bmr=bmr_active) head(active_sample)
The underlying parameter mean for the simulation is an attribute:
attr(active_sample, "post_bmr")
A more interesting example would be to simulate and analyze trial data. For example:
#Manage any errors by assuming the pvalue failed to reach statistical #significance (i.e. pvalue is 1) but keep track of such errors. errorTrackGlobal=list() manageError=function(expr) { tryCatch(eval(quote(expr)), error=function(e){ errorTrackGlobal[[length(errorTrackGlobal)+1]] <<- e$message return(1) }) } #create function to simulate and analyze one virtual trial sim_and_analyze=function() { active_sample=tboot_bmr(100, bmr_active) pbo_sample=tboot_bmr(100, bmr_pbo) data.frame( p_quant1=manageError(t.test(active_sample$quant1,pbo_sample$quant1)$p.value), p_bin1=manageError(fisher.test(active_sample$bin1,pbo_sample$bin1)$p.value), p_bin2=manageError(fisher.test(active_sample$bin2,pbo_sample$bin2)$p.value) ) } #Simulate Pvalues p_sim=do.call(rbind, replicate(100, sim_and_analyze(), simplify = FALSE)) head(errorTrackGlobal) head(p_sim)
The pvalue matrix above may be analyzed, for example, using the gMCP package if multiple testing adjustments are needed.
To describe the algorithm, we use the following notation:
The algorithm for 'tweights_bmr()' takes $y_{k}$ and $X$ as input and proceeds as follows:
The algorithm for 'post_bmr()' takes the output from 'tweights_bmr()' and proceeds as follows:
The algorithm for 'tboot_bmr()' takes the output from 'tweights_bmr()' and proceeds as follows:
The algorithm described above may be justified in several ways. First, it is heuristically plausible. One would expect at first thought that when two variables are correlated, a drug which influences one of the variables will most likely influence the other. Second, in some specific cases, the algorithm may be justified via Bayesian Asymptotics using the 'Berstein Von-Misus' theorem. This document will not attempt to fully work out this more theoretical approach.
The following considerations should be relevent when considering the use of 'tboot_bmr:'
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.