impute.boot: Bootstrapping Imputation for Many Chemicals
In miWQS: Multiple Imputation Using Weighted Quantile Sum Regression

Description Usage Arguments Details Value Note References See Also Examples

If many chemicals have values below the detection limit, this function creates an imputed dataset using a bootstrap procedure as described in Lubin et al. 2004. It repeatedly invokes impute.Lubin().

1	impute.boot(X, DL, Z = NULL, K = 5L, verbose = FALSE)

`X`	A numeric vector, matrix, or data-frame of chemical concentration levels with n subjects and C chemicals to be imputed. Missing values are indicated by NA's. Ideally, a numeric matrix.
`DL`	The detection limit for each chemical as a numeric vector with length equal to C chemicals. Vector must be complete (no NA's); any chemical that has a missing detection limit is not imputed. If DL is a data-frame or matrix with 1 row or 1 column, it is forced as a numeric vector.
`Z`	Any covariates used in imputing the chemical concentrations. Ideally, a numeric matrix; however, Z can be a factor, vector, or data-frame. Assumed to be complete; observations with missing covariate variables are ignored in the imputation, with a warning printed. If none, enter NULL.
`K`	A natural number of imputed datasets to generate. Default: 5L.
`verbose`	Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE.

Lubin et al. (2004) evaluate several imputation approaches and show that a multiple imputation procedure using bootstrapping creates unbiased estimates and nominal confidence intervals unless the proportion of missing data is extreme. The authors coded the multiple imputation procedure in a SAS macro that is currently available. We converted the SAS macro into R code.

The impute.Lubin() function imputes a single chemical with missing values. The distribution for the interval-censored data chemcol is assumed to be lognormal and censored between 0 and DL. After bootstrapping, the values BDL are imputed using the inverse transform method. In other words, generate u_i \sim Unif( 0.0001, dlcol) and assign value F^{-1}(u) to x_{i} for i = 1,...n_{0} subjects with chemical values BDL.

In order to impute a single chemical:

Input arguments.
Obtain bootstrap samples.
Generate weights vector.
Use Surv function from Survival package to obtain survival object.
Use survreg function from Survival package to obtain survival model.
Sample from lognormal distribution with beta and variance from survival model as the parameters to obtain upper and lower bounds.
Randomly generate value from uniform distribution between the previously obtained upper and lower bounds.
Sample from the lognormal distribution to obtain the imputed data value associated with the above uniform value.

impute.boot() repeatedly performs this procedure for all chemicals.

A list of:

X.imputed: A number of subjects (n) x number of chemicals (c) x K array of imputed X values.
bootstrap_index: A n x K matrix of bootstrap indices selected for the imputation.
indicator.miss: A check; the sum of imputed missing values above detection limit, which should be 0.

Note #1: Code was adapted from Erin E. Donahue's original translation of the SAS macro developed from the paper.

Note #2: No seed is set. Please set seed so the same bootstraps are selected.

Note #3: If the length of the DL parameter is greater than the number of components, the smallest value is assumed to be a detection limit. A warning is printed to the screen.

Lubin, J. H., Colt, J. S., Camann, D., Davis, S., Cerhan, J. R., Severson, R. K., … Hartge, P. (2004). Epidemiologic Evaluation of Measurement Data in the Presence of Detection Limits. Environmental Health Perspectives, 112(17), 1691–1696. https://doi.org/10.1289/ehp.7199

Other imputation: impute.Lubin(), impute.multivariate.bayesian(), impute.sub()

data("simdata87")
# Impute using one covariate.
l <- impute.boot(X = simdata87$X.bdl, DL = simdata87$DL, Z = simdata87$Z.sim[, 1],
  K = 2, verbose = TRUE
)
apply(l$X.imputed, 2:3, summary)