knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(pmsimstats) library(data.table) library(lme4) library(lmerTest) library(corpcor) library(ggplot2) library(MASS) library(svMisc) library(tictoc) library(ggpubr) library(gridExtra)
This vignette is designed to walk through the steps used to generate the results published in [___]. The steps fall into four basic steps:
We will walk through each of them in order.
The first step is to define the various clinical trial designs you are interested in comparing, using the function buildtrialdesign
. See ?buildtrialdesign
for details of input and output structures. Note that the baseline timepoint is not included. Here, we are going to build 4 clinical trial designs, each 20 weeks long. We will then merge them into a list, trialdesigns
:
tdNof11<-buildtrialdesign( name_longform="primary N-of-1 design", name_shortform="N-of-1", timepoints=c(4,8,9,10,11,12,16,20), timeptname=c('OL1','OL2','BD1','BD2','BD3','BD4','COd','COp'), expectancies=c(1,1,.5,.5,.5,.5,.5,.5), ondrug=list( # Assumes on drug entire previous interval and this measurement point pathA=c(1,1,1,1,0,0,1,0), pathB=c(1,1,1,1,0,0,0,1), pathC=c(1,1,1,0,0,0,1,0), pathD=c(1,1,1,0,0,0,0,1) ) ) tdOL<-buildtrialdesign( name_longform="open label", name_shortform="OL", timepoints=cumulative(rep(2.5,8)), timeptname=paste("OL",1:8,sep=""), expectancies=rep(1,8), ondrug=list( pathA=rep(1,8) ) ) tdOLBDC<-buildtrialdesign( name_longform="open label+blinded discontinuation", name_shortform="OL+BDC", timepoints=c(4,8,12,16,17,18,19,20), timeptname=c('OL1','OL2','OL3','OL4','BD1','BD2','BD3','BD4'), expectancies=c(1,1,1,1,.5,.5,.5,.5), ondrug=list( pathA=c(1,1,1,1,1,1,0,0), pathB=c(1,1,1,1,1,0,0,0) ) ) tdCO<-buildtrialdesign( name_longform="traditional crossover", name_shortform="CO", timepoints=cumulative(rep(2.5,8)), timeptname=c(paste("COa",1:4,sep=""),paste("COb",1:4,sep="")), expectancies=rep(.5,8), ondrug=list( # Assumes on drug entire previous interval and this measurement point pathA=c(1,1,1,1,0,0,0,0), pathB=c(0,0,0,0,1,1,1,1) ) ) tdPGRCT<-buildtrialdesign( name_longform="Parallel group RCT", name_shortform="PG-RCT", timepoints=cumulative(rep(2.5,8)), timeptname=paste("V",1:8,sep=""), expectancies=rep(.5,8), ondrug=list( pathA=rep(1,8), pathB=rep(0,8) ) ) trialdesigns<-list(PG=tdPGRCT,OL=tdOL,OLBDC=tdOLBDC,CO=tdCO,Nof1=tdNof11)
There are a number of different types of relevant parameters. They are grouped here by (a) what aspect of the simulation they are related to (e.g. generating the expected trajectory of symptom change as a function of treatment exposure vs how dropout is modeled) and (b) how it's logistically easiest to parameterize them.
The baseline parameters are those that describe the mean and standard deviation of (1) the biomarker being used (assumed to be a continuous variable) and (2) the outcome measure (also assumed to be continuous). Here, the biomarker is systolic blood pressure 2 minutes after standing, while the continuous variable is the total score on the CAPS5, a clinician-administered assessment tool for PTSD diagnosis and symptom severity assessment. As long as they are both continuous variables, however, what they represent doesn't matter much for the purposes of the statistical simulations.
The baseline parameters are easy and reliable to extract from existing data, so for this exercise, we did so, and did not vary them. We will just load them in:
data(extracted_bp)
extracted_bp
You can see that the object defining the baseline parameters is a data table with three columns: the "cat" column, telling you what category of baseline parameter you're defining, "m" which tells you the mean, and "sd" tells you the standar deviation. The pmsimstats tools will expect there to be two categories: BL for baseline symptom severity score, and bm for the biomarker.
These raw parameters are then wrapped in a structure that will track them and where they came from, in case you're varying them:
TRblparam<-list( name="TR", verbaldesc="Extracted from data with blank slate assumptions", param=extracted_bp )
We didn't use any alternative baseline parameters in the publication, but you can make a set just for interest's sake, and then as with the trial designs make a list that will let you cycle through them:
altblparam1<-list( name="LowerSDs", verbaldesc="Lower SD for the BL", param=data.table( cat=c("BL","bm"), m=extracted_bp$m, sd=c(10,15.36) ) ) blparamsets<-list(TRblparam,altblparam1)
The response parameters are similar, just very slightly more complicated. Again, we will load in the ones that were based on our raw data:
data(extracted_rp)
extracted_rp
You can see that we have another data.table that defines parameters for 3 factors (refer to the publication for details on what these mean, just track that what's here called tv is there called TR, what's here called pb is there called ER, and what's here called br is there called BR). Each factor is defined by a max (maximum value), disp (displacement), rate, and sd (standard deviation). Refer to \link{modgompertz} for additional details on how these values are used.
We'll put a wrapper around it as before:
TRrespparam<-list( name="TR", verbaldesc="Extracted from data with blank slate assumptions", param=extracted_rp )
You can create alternative response parameters one at a time, to meet your needs, similar to the above:
# Create some alternative response parameters: altrespparam1<-list( name="FastPbBrSlowTv", verbaldesc="Faster Pb & Br, slower tv, all else same", param=data.table( cat=c("tv","pb","br"), max=extracted_rp$max, disp=extracted_rp$disp, rate=c(.5,.2,.2), sd=extracted_rp$sd ) ) altrespparam2<-list( name="LargeTVsmallPb", verbaldesc="Large TV smaller Pb, all else same", param=data.table( cat=c("tv","pb","br"), max=c(9,4,11), disp=extracted_rp$disp, rate=extracted_rp$rate, sd=extracted_rp$sd ) ) origrespparamsets<-list(TRrespparam,altrespparam1,altrespparam2)
Alternatively, because how the results depend on these response parameters is a particularly interesting aspect of this type of analysis, you can generate them more systematically. I did this in a rather hacky way that was just easy to think through in the moment, you can do it however you'd like:
# 1) ridiculous way to make it a data table TRlongform<-as.data.table(melt(TRrespparam$param)) TRlongform[,rp:=paste(cat,variable,sep="_")] TRgrid<-TRlongform$value TRgrid<-data.table(rbind(TRgrid,TRgrid,TRgrid,TRgrid,TRgrid)) setnames(TRgrid,names(TRgrid),TRlongform$rp) #2) fill it out: maxesgrid<-data.table( tv_max=c(18,3,3,8), pb_max=c(3,18,3,8), br_max=c(3,3,18,8) ) rates<-c(.2,.35,.5) ratesgrid<-expand.grid( tv_rate=rates, pb_rate=rates, br_rate=rates ) respparamsets<-vector(mode="list",length=1+dim(maxesgrid)[1]+dim(ratesgrid)[1]) respparamsets[[1]]<-TRrespparam for(irp in 1:dim(maxesgrid)[1]){ op<-maxesgrid[irp,] newrp<-list( name=paste("MXtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), verbaldesc=paste("MXtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), param=data.table( cat=c("tv","pb","br"), max=as.numeric(op), disp=extracted_rp$disp, rate=extracted_rp$rate, sd=as.numeric(op) ) ) respparamsets[[irp+1]]<-newrp } for(irp in 1:dim(ratesgrid)[1]){ op<-ratesgrid[irp,] newrp<-list( name=paste("RTtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), verbaldesc=paste("RTtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), param=data.table( cat=c("tv","pb","br"), max=extracted_rp$max, disp=extracted_rp$disp, rate=as.numeric(op), sd=extracted_rp$sd ) ) respparamsets[[irp+1+dim(maxesgrid)[1]]]<-newrp } # One more version, to focus further on maxes: TRlongform<-as.data.table(melt(TRrespparam$param)) TRlongform[,rp:=paste(cat,variable,sep="_")] TRgrid<-TRlongform$value TRgrid<-data.table(rbind(TRgrid,TRgrid,TRgrid,TRgrid,TRgrid)) setnames(TRgrid,names(TRgrid),TRlongform$rp) maxes<-c(5,8,11) maxesgrid<-expand.grid( tv_max=maxes, pb_max=maxes, br_max=maxes ) respparamsetsm<-vector(mode="list",length=dim(maxesgrid)[1]) for(irp in 1:dim(maxesgrid)[1]){ op<-maxesgrid[irp,] newrp<-list( name=paste("MXtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), verbaldesc=paste("MXtv",op$tv_max,"pb",op$pb_max,"br",op$br_max,sep=""), param=data.table( cat=c("tv","pb","br"), max=as.numeric(op), disp=extracted_rp$disp, rate=extracted_rp$rate, sd=as.numeric(op) ) ) respparamsetsm[[irp]]<-newrp }
This generated respparamsets
, where the rates were varied, and respparamsetsm
, where the maxes were varied.
The censoring parameters are used to create versions of the simulated data where various droppout patterns are simulated. See \link{censordata} for details of how this is implemented and what the parameters do.
The censoring parameters are much simpler than the response parameters, with only two main parameters (beta0 and beta1), and a third parameter (eb1) that controls the shape of the biased dropout curve, but doesn't have a strong effect and was not varied for the publication. The parameter combinations to test are simply combined into a single data table, along with a brief name that can remind you what they represent (and is used in plots):
censorparams<-data.table( names=c("balanced","more of flat","more of biased","high dropout"), beta0=c(.05,.05 ,.05,.15), beta1=c(.5,.2,.8 ,.5), eb1= c(2, 2 ,2 ,2 ) )
This set of parameters includes things that it's easy / reasonable to vary as individual parameters. Only some of them were varied in the manuscript: N, c.bm, and carryover_t1half. The others were played with, but it didn't make it in the paper. See ?generateData
for information on each of the parameters.
Here we made a few different sets of the model parameters, for different purposes. One to test the impact of these core parameters on power (coremodelparams
), one to run an abbreviated set when testing the impact of varying the response parameters, a third to redo this with even lower N and biomarker effect so that we could see the impact of varying the rate parameters without half the cells capping out at 1, and a final one to get raw data output to plot the trajectories:
coremodelparams<-expand.grid( N=c(35,70), c.bm=c(0,.3,.6), carryover_t1half=c(0,.1,.2), c.tv=.8,c.pb=.8,c.br=.8, c.cf1t=.2,c.cfct=.1 ) abbrevmodelparams<-expand.grid( N=c(70), c.bm=c(0,.3), carryover_t1half=c(0), c.tv=.8,c.pb=.8,c.br=.8, c.cf1t=.2,c.cfct=.1 ) xabbrevmodelparams<-expand.grid( N=c(35), c.bm=c(.6), carryover_t1half=c(0), c.tv=.8,c.pb=.8,c.br=.8, c.cf1t=.2,c.cfct=.1 ) trajmp<-expand.grid( N=c(35), c.bm=c(.6), carryover_t1half=c(0,.5), c.tv=.8,c.pb=.8,c.br=.8, c.cf1t=.2,c.cfct=.1 )
There are also parameters used to set strictly analysis variables; these should be kept off for the size of trials modeled here. See ?lme_analysis
for more details of these.
# Parameters on the analysis side analysisparams<-expand.grid(useDE=FALSE,t_random_slope=FALSE,full_model_out=FALSE)
And, finally, you're ready to generate the simulated data! Be warned that this is computationally intense for a light little laptop like I'm using - so in the vignette text, there's a flag that keeps this code from running, and the Nrep is also set low. In the publication, all parameter spaces were repeated 1,000 times.
There are only a few settings left that aren't explained above - see ?generateData
and the wrapper function ?generateSimulatedResuls
for explanations of how they work. If you need to run smaller
numbers of repetitions and then rejoin them you can do so, as long as they have the exact same parameter space, using \link{reknitsimresults}.
# Set the number of reptitions. Was 1,000 in the publication Nreps<-2 # Flag to keep this code from actually running for now - flip to try it rerun_simulations<-TRUE if(rerun_simulations){ # This block runs the paramater sets that explore the impact of N, biomarker effect size, # and carryover browser() simresults<-generateSimulatedResults( trialdesigns=list(trialdesigns$OL,trialdesigns$OLBDC,trialdesigns$CO,trialdesigns$Nof1), respparamsets=origrespparamsets[1], blparamsets=blparamsets[1], censorparams=censorparams, modelparams=coremodelparams, simparam=list(Nreps=Nreps, progressiveSave=TRUE, basesavename="coreresults", nRep2save=5, saveunit2start=1, savedir=getwd()), analysisparams=analysisparams, rawdataout=FALSE) # This block tests the inmpact of varying the maximums for the response params simresults<-generateSimulatedResults( trialdesigns=list(trialdesigns$OL,trialdesigns$OLBDC,trialdesigns$CO,trialdesigns$Nof1), respparamsets=respparamsetsm, blparamsets=blparamsets[1], censorparams=censorparams[1,], modelparams=xabbrevmodelparams, simparam=list(Nreps=Nreps, progressiveSave=TRUE, useDE=FALSE, basesavename="respmaxes", nRep2save=10, saveunit2start=1, savedir=getwd()), analysisparams=analysisparams, rawdataout=FALSE) # This block tests of impact of varying the rates for the response params simresults<-generateSimulatedResults( trialdesigns=list(trialdesigns$OL,trialdesigns$OLBDC,trialdesigns$CO,trialdesigns$Nof1), respparamsets=respparamsets, blparamsets=blparamsets[1], censorparams=censorparams[1,], modelparams=xabbrevmodelparams, simparam=list(Nreps=Nreps, progressiveSave=TRUE, useDE=FALSE, basesavename="resprates", nRep2save=10, saveunit2start=1, savedir=getwd()), analysisparams=analysisparams, rawdataout=FALSE) # Finally, this block generates some with rawdata to plot the trajectories simresults<-generateSimulatedResults( trialdesigns=list(trialdesigns$OL,trialdesigns$OLBDC,trialdesigns$CO,trialdesigns$Nof1), respparamsets=origrespparamsets[1], blparamsets=blparamsets[1], censorparams=NA, modelparams=trajmp, simparam=list(Nreps=Nreps, progressiveSave=FALSE, useDE=FALSE, basesavename="trajectoryoutputs", nRep2save=1, saveunit2start=1, savedir=getwd()), analysisparams=analysisparams, rawdataout=TRUE) }
For how to look at the resulting data sets, see the accompanying vignette, Produce_Publication_Results_2_Look_at_data
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.