draws | R Documentation |
draws
fits the base imputation model to the observed outcome data
according to the given multiple imputation methodology.
According to the user's method specification, it returns either draws from the posterior distribution of the
model parameters as required for Bayesian multiple imputation or frequentist parameter estimates from the
original data and bootstrapped or leave-one-out datasets as required for conditional mean imputation.
The purpose of the imputation model is to estimate model parameters
in the absence of intercurrent events (ICEs) handled using reference-based imputation methods.
For this reason, any observed outcome data after ICEs, for which reference-based imputation methods are
specified, are removed and considered as missing for the purpose of estimating the imputation model, and for
this purpose only. The imputation model is a mixed model for repeated measures (MMRM) that is valid
under a missing-at-random (MAR) assumption.
It can be fit using maximum likelihood (ML) or restricted ML (REML) estimation,
a Bayesian approach, or an approximate Bayesian approach according to the user's method specification.
The ML/REML approaches and the approximate Bayesian approach support several possible covariance structures,
while the Bayesian approach based on MCMC sampling supports only an unstructured covariance structure.
In any case the covariance matrix can be assumed to be the same or different across each group.
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
## S3 method for class 'approxbayes'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
## S3 method for class 'condmean'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
## S3 method for class 'bmlmi'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
## S3 method for class 'bayes'
draws(data, data_ice = NULL, vars, method, ncores = 1, quiet = FALSE)
data |
A |
data_ice |
A |
vars |
A |
method |
A |
ncores |
A single numeric specifying the number of cores to use in creating the draws object.
Note that this parameter is ignored for |
quiet |
Logical, if |
draws
performs the first step of the multiple imputation (MI) procedure: fitting the
base imputation model. The goal is to estimate the parameters of interest needed
for the imputation phase (i.e. the regression coefficients and the covariance matrices
from a MMRM model).
The function distinguishes between the following methods:
Bayesian MI based on MCMC sampling: draws
returns the draws
from the posterior distribution of the parameters using a Bayesian approach based on
MCMC sampling. This method can be specified by using method = method_bayes()
.
Approximate Bayesian MI based on bootstrapping: draws
returns
the draws from the posterior distribution of the parameters using an approximate Bayesian approach,
where the sampling from the posterior distribution is simulated by fitting the MMRM model
on bootstrap samples of the original dataset. This method can be specified by using
method = method_approxbayes()]
.
Conditional mean imputation with bootstrap re-sampling: draws
returns the
MMRM parameter estimates from the original dataset and from n_samples
bootstrap samples.
This method can be specified by using method = method_condmean()
with
argument type = "bootstrap"
.
Conditional mean imputation with jackknife re-sampling: draws
returns the
MMRM parameter estimates from the original dataset and from each leave-one-subject-out sample.
This method can be specified by using method = method_condmean()
with
argument type = "jackknife"
.
Bootstrapped Maximum Likelihood MI: draws
returns the MMRM parameter estimates from
a given number of bootstrap samples needed to perform random imputations of the bootstrapped samples.
This method can be specified by using method = method_bmlmi()
.
Bayesian MI based on MCMC sampling has been proposed in Carpenter, Roger, and Kenward (2013) who first introduced reference-based imputation methods. Approximate Bayesian MI is discussed in Little and Rubin (2002). Conditional mean imputation methods are discussed in Wolbers et al (2022). Bootstrapped Maximum Likelihood MI is described in Von Hippel & Bartlett (2021).
The argument data
contains the longitudinal data. It must have at least the following variables:
subjid
: a factor vector containing the subject ids.
visit
: a factor vector containing the visit the outcome was observed on.
group
: a factor vector containing the group that the subject belongs to.
outcome
: a numeric vector containing the outcome variable. It might contain missing values.
Additional baseline or time-varying covariates must be included in data
.
data
must have one row per visit per subject. This means that incomplete
outcome data must be set as NA
instead of having the related row missing. Missing values
in the covariates are not allowed.
If data
is incomplete
then the expand_locf()
helper function can be used to insert any missing rows using
Last Observation Carried Forward (LOCF) imputation to impute the covariates values.
Note that LOCF is generally not a principled imputation method and should only be used when appropriate
for the specific covariate.
Please note that there is no special provisioning for the baseline outcome values. If you do not want baseline
observations to be included in the model as part of the response variable then these should be removed in advance
from the outcome variable in data
. At the same time if you want to include the baseline outcome as covariate in
the model, then this should be included as a separate column of data
(as any other covariate).
Character covariates will be explicitly
cast to factors. If you use a custom analysis function that requires specific reference
levels for the character covariates (for example in the computation of the least square means
computation) then you are advised
to manually cast your character covariates to factor in advance of running draws()
.
The argument data_ice
contains information about the occurrence of ICEs. It is a
data.frame
with 3 columns:
Subject ID: a character vector containing the ids of the subjects that experienced
the ICE. This column must be named as specified in vars$subjid
.
Visit: a character vector containing the first visit after the occurrence of the ICE
(i.e. the first visit affected by the ICE).
The visits must be equal to one of the levels of data[[vars$visit]]
.
If multiple ICEs happen for the same subject, then only the first non-MAR visit should be used.
This column must be named as specified in vars$visit
.
Strategy: a character vector specifying the imputation strategy to address the ICE for this subject.
This column must be named as specified in vars$strategy
.
Possible imputation strategies are:
"MAR"
: Missing At Random.
"CIR"
: Copy Increments in Reference.
"CR"
: Copy Reference.
"JR"
: Jump to Reference.
"LMCF"
: Last Mean Carried Forward.
For explanations of these imputation strategies, see Carpenter, Roger, and Kenward (2013), Cro et al (2021),
and Wolbers et al (2022).
Please note that user-defined imputation strategies can also be set.
The data_ice
argument is necessary at this stage since (as explained in Wolbers et al (2022)), the model is fitted
after removing the observations which are incompatible with the imputation model, i.e.
any observed data on or after data_ice[[vars$visit]]
that are addressed with an imputation
strategy different from MAR are excluded for the model fit. However such observations
will not be discarded from the data in the imputation phase
(performed with the function (impute()
). To summarize, at this stage only pre-ICE data
and post-ICE data that is after ICEs for which MAR imputation is specified are used.
If the data_ice
argument is omitted, or if a subject doesn't have a record within data_ice
, then it is
assumed that all of the relevant subject's data is pre-ICE and as such all missing
visits will be imputed under the MAR assumption and all observed data will be used to fit the base imputation model.
Please note that the ICE visit cannot be updated via the update_strategy
argument
in impute()
; this means that subjects who didn't have a record in data_ice
will always have their
missing data imputed under the MAR assumption even if their strategy is updated.
The vars
argument is a named list that specifies the names of key variables within
data
and data_ice
. This list is created by set_vars()
and contains the following named elements:
subjid
: name of the column in data
and data_ice
which contains the subject ids variable.
visit
: name of the column in data
and data_ice
which contains the visit variable.
group
: name of the column in data
which contains the group variable.
outcome
: name of the column in data
which contains the outcome variable.
covariates
: vector of characters which contains the covariates to be included
in the model (including interactions which are specified as "covariateName1*covariateName2"
).
If no covariates are provided the default model specification of outcome ~ 1 + visit + group
will be used.
Please note that the group*visit
interaction
is not included in the model by default.
strata
: covariates used as stratification variables in the bootstrap sampling.
By default only the vars$group
is set as stratification variable.
Needed only for method_condmean(type = "bootstrap")
and method_approxbayes()
.
strategy
: name of the column in data_ice
which contains the subject-specific imputation strategy.
In our experience, Bayesian MI (method = method_bayes()
) with a relatively low number of
samples (e.g. n_samples
below 100) frequently triggers STAN warnings about R-hat such as
"The largest R-hat is X.XX, indicating chains have not mixed". In many instances, this warning
might be spurious, i.e. standard diagnostics analysis of the MCMC samples do not indicate any
issues and results look reasonable. Increasing the number of samples to e.g. above 150 usually
gets rid of the warning.
A draws
object which is a named list containing the following:
data
: R6 longdata
object containing all relevant input data information.
method
: A method
object as generated by either method_bayes()
,
method_approxbayes()
or method_condmean()
.
samples
: list containing the estimated parameters of interest.
Each element of samples
is a named list containing the following:
ids
: vector of characters containing the ids of the subjects included in the original dataset.
beta
: numeric vector of estimated regression coefficients.
sigma
: list of estimated covariance matrices (one for each level of vars$group
).
theta
: numeric vector of transformed covariances.
failed
: Logical. TRUE
if the model fit failed.
ids_samp
: vector of characters containing the ids of the subjects included in the given sample.
fit
: if method_bayes()
is chosen, returns the MCMC Stan fit object. Otherwise NULL
.
n_failures
: absolute number of failures of the model fit.
Relevant only for method_condmean(type = "bootstrap")
, method_approxbayes()
and method_bmlmi()
.
formula
: fixed effects formula object used for the model specification.
James R Carpenter, James H Roger, and Michael G Kenward. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics, 23(6):1352–1371, 2013.
Suzie Cro, Tim P Morris, Michael G Kenward, and James R Carpenter. Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Statistics in Medicine, 39(21):2815–2842, 2020.
Roderick J. A. Little and Donald B. Rubin. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Hoboken, New Jersey, 2002. [Section 10.2.3]
Marcel Wolbers, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, Jonathan W. Bartlett. Standard and reference-based conditional mean imputation. https://arxiv.org/abs/2109.11162, 2022.
Von Hippel, Paul T and Bartlett, Jonathan W. Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws. 2021.
method_bayes()
, method_approxbayes()
, method_condmean()
, method_bmlmi()
for setting method
.
set_vars()
for setting vars
.
expand_locf()
for expanding data
in case of missing rows.
For more details see the quickstart vignette:
vignette("quickstart", package = "rbmi")
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.