knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
lodi
and an example datasetFor convenience we have included a example dataset called toy_data
, which can
be loaded by running data("toy_data")
. Let's look at the first 10 entries
of the example dataset.
library(lodi) data("toy_data") head(toy_data, n = 10)
id
corresponds to the study ID and is unimportant for the purposes of this
example. case_cntrl
takes values 0 or 1, where 1 indicates that
the subject has the disease of interest and 0 indicates that the subject is a
healthy control. poll
is the environmental exposure of interest, where NA
indicates that the concentration is below the limit of detection (LOD).
smoking
and gender
are covariates that we will include in the imputation
model. lod
corresponds to the limit of detection for each
individual's batch. Finally, batch1
takes two values; 1 if the subject's
biosample was assayed in batch 1 and 0 if the subject's biosample was assayed
in batch 2.
The function that performs censored likelihood multiple imputation is the
clmi
function. For more details see help(clmi)
.
clmi.out <- clmi(formula = log(poll) ~ case_cntrl + smoking + gender, df = toy_data, lod = lod, seed = 12345, n.imps = 5)
The main input to clmi
is a R formula. The left hand side of the formula must
be the exposure, and the right hand side must be the outcome followed by the
covariates you want to include in the imputation model. The order of variables
on the right hand side matters. You can apply a transformation to the exposure
by applying a univariate function to it, as done above. The lod
argument
refers to the name of the lod variable in your data.frame.
The imputed datasets can be extracted as a list using $imputed.dfs
:
extract.imputed.dfs <- clmi.out$imputed.dfs
The pool.clmi
function takes the output generated by the clmi
function, fits
outcome models on each of the imputed datasets, and pools inference across
outcome models using Rubin's rules. For details see help(pool.clmi)
.
results <- pool.clmi(formula = case_cntrl ~ poll_transform_imputed + smoking + gender, clmi.out = clmi.out, type = logistic)
In pool.clmi
, formula
contains the outcome variable on the left hand side
and the first variable on the right hand side should be the imputed exposure
variable. clmi
outputs the exposure variable as
((your-exposure))_transform_imputed
. In this example, our exposure is poll
,
so the name of the imputed variable is poll_transform_imputed
.
type
argument. If you have binary
outcome data (as in the current example) use type = logistic
so that the
model fit on the imputed datasets are logistic regression models. If you have
continuous outcome data use regression.type = linear
so that models fit on
the imputed datasets are linear regression models.To display the pooled results use $output
:
results$output
If you want to look at the individual regressions fit on each imputed dataset
use $regression.summaries
results$regression.summaries
Boss J, Mukherjee B, Ferguson KK, et al. Estimating outcome-exposure associations when exposure biomarker detection limits vary across batches. Epidemiology. 2019;30(5):746-755. 10.1097/EDE.0000000000001052
If you would like to report a bug in the code, ask questions, or send
requests/suggestions e-mail Jonathan Boss at bossjona@umich.edu
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.