knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The generateData
function calls the low level generate data components to create sets of simulated data. The following components are called to create aspects of the simulated trial data:
The generateData
function iteratively builds and combines the data components for each replicate, and stores the data in the “ReplicateData” subdirectory of the working directory. This data can then be analyzed using a call to the analyzeData
function.
The generateData
function takes a number of arguments which are passed down to the various lower level functions.
|Argument Name|Description|
|:------------|-----------|
|replicateN |Specifies how many replicates / simulated trials to generate|
|subjects |TOTAL number of subjects for the whole design. The default behaviour is to allocate subjects to each treatment with equal probability, which may not guarantee equal allocation. See treatSubj
below for further details of treatment allocation methods.|
|treatDoses|Specifies the doses to be used in simulations. MSToolkit was designed to evaluate the operating characteristics of clinical trials, but its functionality can be extended to simulate non-clinical trials by thinking of “doses” as other factors which vary between individuals within a simulation replicate.
In the generated dataset TRT defines the treatment arm to which a subject is allocated and for parallel group designs this has a corresponding, unique value of DOSE. However in the case of parallel group trials, TRT is the treatment sequence to which a subject is allocated and must have a corresponding treatment sequence.|
|respEqn|Specifies the linear predictor for generating outcome values. This should be a valid R expression or function. The expression can be written directly in generateData
or an R function defined outside of generateData
can be called. This function must return a vector of equal length to the number of rows in the generated data - one value per subject or one value per observation (TIME) within each subject.|
|Argument Name|Description|
|:------------|-----------|
|treatSubj
treatProp|treatSubj
specifies the precise number of subjects to allocate to each treatment (the sum of the elements of this vector must equal subjects above). treatProp
is a vector of proportions specifying how subjects are to be allocated to each treatment in the proportions specified (the vector must be of the same length as the number of treatments - the length of treatDoses
above and sum to 1). Specify only one of these arguments.
treatProp
defines the probabilities of allocating each treatment and does not guarantee that the exact proportion will be allocated to a given treatment. treatSubj
on the other hand allocates exactly the specified number to each treatment. If the sum of the number of subjects in treatSubj
does not equal subjects (above) then the sum of treatSubj
is used in place of subjects.
|treatType
treatSeq
treatPeriod| If treatType
is "crossover" then treatSeq
should contain the treatment sequences for subjects to be allocated to. Each subject is then randomly allocated to one of the treatment sequences unless treatSubj
is specified as above.
treatPeriod
defines the timing of observations / response values. If any times are less than zero then it is assumed that DOSE=0 for these measurements (i.e. we assume a placebo run-in). For times greater than or equal to zero DOSE is as specified in treatDoses
.|
|genParNames
genParMean
genParVCov
genParNames|genParNames
defines the names to be used for the data generation model parameters for calculations and in the output dataset.
genParMean
and genParVCov
define the mean value for these parameters and the variance-covariance matrix defining how these parameters will vary across trial replicates. By default we assume that genParVCov = 0 (i.e. parameters have fixed values across trial replicates). See the Simulation Overviewpage for more information.|
|genParBtwNames
genParBtwMean
genParBtwVCov
genParBtwCrit
genParErrStruc|These parameters define how between subject variability is to be included for the parameters used in respEqn
.
Variables defined in genParNames
which also appear in genParBtwNames
will have values generated from a (multivariate) Normal distribution with mean genParBtwMean
and variance-covariance matrix genParBtwVCov
. By default we assume that genParBtwMean
= 0 for all parameters i.e. the parameters used in respEqn
will have means specified by genParMean
(with between replicate variability specified by genParVCov
) and will vary between subjects with covariance genParBtwVCov
. This process mirrors the usual hierarchical model construct with fixed and random effects.
genParBtwCrit
applies ranges to the values generated (similar to conCovCrit
above).
If genParErrStruc
is specified as “additive” or “proportional” then the subject specific variation is added to the fixed effect values in an appropriate way. “additive” simply adds the values, while “proportional” adds the subject specific variation to the logged fixed effect value and then exponentiates. If genParErrStruc
is “none” then the two values are returned separately to the generated dataset for the user to combine and use in an appropriate way.|
|respDist
respVCov
respInvLink
respErrStruc
respCrit
respDigits|These parameters define the distributional properties for the generated response variable.
respEqn
gives the linear predictor for response, defining how treatments, doses, covariates, time etc. relate to the mean response for an individual. This linear predictor can then be used within a normal distribution to define continuous response variables or, with the appropriate link function (specified in respInvLink), can be used with binomial or poisson distributions to create binary or count data.
If we are creating continuous response outcomes then we can specify the residual (or within subject) variability, how this variability is added to the values from respEqn
through respErrStruc
and whether the generated residual values need to be constrained within certain ranges (given by respCrit
).
Finally we can specify the number of significant digits for the generated response. MSToolkit
version 2.0.0 only uses 1 value for residual error, although future versions will extend this to allow multiple residual error parameters to be created.|
|interimSubj|interimSubj
defines how subjects will be assigned to interim analysis data subsets. This should be a vector of cumulative proportions e.g. c(0.3,0.6) or c(0.25,0.5,0.75). MSToolkit
will partition the dataset and allocate subjects randomly to one of the interim analysis subsets.|
|mcarProp
mcarRule
dropFun|These parameters define how missing data is to be generated and rules for dropping subjects. dropFun
can be any valid R function and so can use dataset covariates, parameters and responses as drivers for the dropout function.|
|conCovNames
conCovMean
conCovVCov
conCovCrit
conCovMaxDraws|These parameters define how continuous covariates are to be generated across subjects within replicates. Values are drawn from (multivariate) Normal distributions. conCovCrit
specifies ranges or criteria for each covariate value. If the number of draws from the distribution exceeds conCovMaxDraws
before an acceptable value is found then a warning is given.|
|disCovNames
disCovVals
disCovProb
disCovProbArray|These parameters define how discrete covariates are to be generated across subjects within replicates. Values of the discrete parameters are specified in disCovVals
and then these values are generated in proportions given by disCovProb
or disCovProbArray
if user wish to specify associations between discrete covariate values.|
|extCovNames
extCovFile
extCovSubset
extCovRefCol
extCovSameRow
extCovDataId|Covariate values can be sampled from an external file (e.g. an existing database in an ASCII file). These parameters define which variables to sample from the external file, the name of that file and whether to subset the data in that file before sampling. Users can choose to bring into the generated dataset a reference variable identifying which rows of the external datafile have been sampled (in order to check data values). It is also possible to specify whether to sample covariate values independently (default) or whether to sample covariate values from within the same row of the external file, thus preserving correlations between covariates without making normality assumptions. If a value is given for extCovDataId
then this is used to identify covariate values from each unique ID within the external datafile.|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.