Description Usage Arguments Details Value Author(s) References See Also Examples
‘BBPMM’ performs single and multiple imputation (MI) of mixedscale variables using a chained equations approach and (Bayesian Bootstrap) Predictive Mean Matching.
1 2 3 
Data 
A partially incomplete data frame or matrix. 
M 
Number of multiple imputations. If M=1, no Bayesian Bootstrap step is carried out. Default=10. 
nIter 
Number of iterations of the chained equations algorithm
before the data set is stored as an 'imputed data set'. If set to "autolin", the numbers of iterations will be selected using a data monotonicity index (based on 
outfile 
A character string that specifies the path and file name for the imputed data sets. If outfile=NULL (default), no data set is stored 
ignore 
A character or numerical vector that specifies either column positions or variable names that are to be excluded from the imputation model and process, e.g. an ID variable. If ignore=NULL (default), all variables in Data are used in the imputation model. 
vartype 
A character vector that flags the class of each variable in Data (without the variables defined by the ignore argument), with either 'M' for metricscale or 'C' for categorical. The default (NULL) takes over the classes of Data. Overruling these classes can sometimes make sense: e.g., an ordinalscale variable is originally classified as ‘factor’, but treating it as metricscale variable within the imputation process might still be a better choice (considering the robust properties of predictive mean matching to model misspecification). 
stepmod 
Performs variable selection for each imputation model based on the either on Schwarz (Bayes) Information criterion (backward). Default="stepAIC". 
maxit.multi 
Imported argument from the nnet package that specifies the maximum number of iterations for the multinomial logit model estimation. Default=3. 
maxit.glm 
Argument for specification of the maximum number of iterations for the binomial logit model estimation (i.e., glm). Default=25. 
maxPerc 
The maximum percentage the mode category of a variable is allowed to have in order to try ‘regular’ imputation. If a variable is approximately Dirac distributed, i.e. if it has (almost) no variance, imputation is carried out by simple hot deck imputation. Default = 0.98. 
verbose 
The algorithm prints information on imputation and iteration numbers. Default=TRUE. 
setSeed 
Optional argument to fix the pseudorandom number generator in order to allow for reproducible results. 
chainDiagnostics 
Argument specifying if Monte Carlo chains for further diagnostics should be returned as well. Default=TRUE. 
... 
Further arguments passed to or from other functions. 
BBPMM
is based on a chained equations approach
that is using a Bayesian Bootstrap approach and Predictive Mean
Matching (PMM) variants for metricscale, binary, and multicategorical
variables to generate multiple imputations. In order to emulate a
monotone missingdata pattern as well as possible, variables are sorted
by rate of missingness (in ascending order). If no complete variables
exist, the least incomplete variable is imputed via hotdeck. The
starting solution then builds the imputation model using the observed values of
a particular y variable, and the corresponding observed or already
imputed values of the x variables (i.e., all variables with fewer
missing values than y).
Due to the PMM element in the algorithm,
autocorrelation of subsequent iterations is virtually zero. Therefore, a
burnin period is not required, and there is no need to administer
‘high’ values (> 20) to nIter either.
If M=1, no Bayesian Bootstrap step is carried out for the chained equations. Note that in this case the algorithm is still unlikely to converge to a stable solution, because of the Predictive Mean Matching step.
call 
The call of 
mis.num 
Vector containing the numbers of missing values per column. 
modelselection 
Chosen model selection method for the function call. 
seed 
Chosen seed value for the function call. 
impdata 
The imputed data set, if M=1, or a list containing M imputed data sets. 
misOverview 
The percentage of missing values per incomplete variable. 
indMatrix 
A matrix with the same dimensions as Data minus ignore containing flags for missing values. 
M 
Number of (multiple) imputations. 
nIter 
Number of iterations between two imputations. 
Chains 
List containing the the Gibbs sampler sequences for every variable of every imputation for every iteration. 
FirstSeed 
First 
LastSeed 
Last 
ignoredvariables 
TRUE / FALSE indicator whether variables were ignored during imputation. 
Florian Meinfelder, Thorsten Schnapp [ctb]
KollerMeinfelder, F. (2009) Analysis of Incomplete Survey Data – Multiple Imputation Via Bayesian Bootstrap Predictive Mean Matching, doctoral thesis.
Little, R.J.A. (1988) MissingData Adjustments in Large Surveys, Journal of Business and Economic Statistics, Vol. 6, No. 3, pp. 287296.
Raghunathan T.E. and Lepkowski, J.M. and Van Hoewyk, J. and Solenberger, P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, Vol. 27, pp. 85–95.
Rubin DB (1981) The Bayesian Bootstrap. The Annals of Statistics, Vol. 9, pp. 130–134.
Rubin, D.B. (1987) Multiple Imputation for NonResponse in Surveys. New York: John Wiley & Sons, Inc.
Van Buuren, S. and Brand, J.P.L. and GroothuisOudshoorn, C.G.M. and Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, Vol. 76, No. 12, pp. 1049–1064.
Van Buuren, S. and GroothuisOudshoorn, K. (2011) mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, Vol. 45, No. 3, pp. 1–67. URL http://www.jstatsoft.org/v45/i03/.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. New York: Springer.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  ### sample data set with nonnormal variables
set.seed(1000)
n < 50
x1 < round(runif(n,0.5,3.5))
x2 < as.factor(c(rep(1,10),rep(2,25),rep(3,15)))
x3 < round(rnorm(n,0,3))
y1 < round(x10.25*(x2==2)+0.5*x3+rnorm(n,0,1))
y1 < ifelse(y1<1,1,y1)
y1 < as.factor(ifelse(y1>4,5,y1))
y2 < x1+rnorm(n,0,0.5)
y3 < round(x3+rnorm(n,0,2))
data1 < as.data.frame(cbind(x1,x2,x3,y1,y2,y3))
misrow1 < sample(n,20)
misrow2 < sample(n,15)
misrow3 < sample(n,10)
is.na(data1[misrow1, 4]) < TRUE
is.na(data1[misrow2, 5]) < TRUE
is.na(data1[misrow2, 6]) < TRUE
### imputation
imputed.data < BBPMM(data1, nIter=5, M=5)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.