imi.glm.more: imi.glm.more
In vahidnassiri/imi: Implemenets Iterative Multiple Imputation

Description Usage Arguments Value References Examples

This function fits a generalized linear model for a given set of predictors and a response variable to an incomplete dataset using multiple imputation and determines the sufficient number of imputed datasets using iterative multiple imputation (imi) procedure when a set of already imputed datasets are available.

1
2
3

imi.glm.more(data.miss,data.imp0,family=binomial(link='logit'),max.M=500,epsilon,
             method='pmm',resp,regressors,conv.plot=TRUE,dis.method='mahalanobis',
             mah.scale='combined',successive.valid=3,max.iter.glm=1000)

`data.miss`	A data frame with the variable in the model as its columns. Note that the missing values should be indicated by NA.
`family`	Indicating the error distribution and the link function (see documentation of function glm in R stats package).
`data.imp0`	A list with already imputed sets of data as its components.
`max.M`	The maximum number of iterations which the algorithm should terminate afterwards in case of non-covergence.
`epsilon`	The threshold for difference between two iterations.
`method`	Specifying string value 'mvn' would impute the data using a multivariate normal predictive model in R package amelia2, any other specification will impute the data using fully conditional specification approach in R package mice. One can see the method in documentation of function mice in R package mice. Specifying 'auto' will selected the predictive model based on the measurement level of each variable.
`resp`	A string value input with the name of the response variable. Note that this should match the name of one of the columns in the data.miss.
`regressors`	A vector of string values with the names of the predictors. Note that they should match the names of the variables in data.miss.
`conv.plot`	A logitical value, if TRUE then a convergence plot will be generated, if FALSE no plot will be provided.
`dis.method`	A string takes its values among 'euclidean', 'inf.norm', and 'mahalanobis' which specifies the distance measure between two iterations. Note that our suggestion is to use 'mahalanobis', other options are provided for research purposes.
`mah.scale`	A string takes its values among 'within', 'between', and 'combined' which specifies the scale matrix in Mahalanobis distance. Note that our suggestion is to use 'combined', other options are provided for research purposes.
`successive.valid`	An integer with minimum 1 which specifies the number of successive steps the stopping rule should be validated so the procedure could terminate.
`max.iter.glm`	The maximum number of iterations for the glm algorithm.
`print.progress`	A logical variable, if TRUE it prints the progress of imputation.

`mi.param`	A list with the final MI-based estimated model parameters, their covariance matrix, as well as the within and between imputation covariance matrices.
`data.imp`	A list with imputed datasets as its components.
`dis.steps`	A vector with computed distance between iterations.
`conv.status`	If 1 then convergence is achieved, if 0 with max.M iterations, still the convergence could not be achieved.
`M`	The selected number of imputed datasets.

https://www.rdocumentation.org/packages/mice/versions/2.30/topics/mice

https://cran.r-project.org/web/packages/Amelia/

# specifying sample size and number of predictors
sample.size=100
num.var.log=2
# creating a correlated set of predictors
x.orig=matrix(rnorm(num.var.log*sample.size),sample.size,num.var.log)
x=cbind(scale(x.orig,
              center=TRUE,scale=FALSE))
# creating the compound-symmetry structured covariance matrix
sigma2=4
tau=1
cov.mat=diag(sigma2,num.var.log)+tau
# making the data correlated
chol.cov=chol(cov.mat)
for (i in 1:sample.size){
  x[i,]=t(chol.cov)
}
x=t(t(x)+apply(x.orig,2,mean))
# specifying model parameters
beta=c( 0.2, -1.0,  0.5)
z=beta[1]+(x
# computing inverse logit transformation
pr = 1/(1+exp(-z))
# generating response variable
y = rbinom(n.samp,1,pr)
# creating complete the data
data = data.frame(y,x)
# specifying the regressors and predictors
resp='y'
regressors=c('X1','X2')
# creating missing values in the dataset
require('mice')
x.miss=ampute(x,prop=0.1,mech='MAR')$amp
data.miss=data.frame(y,x.miss)
# Determining number of imputations, impute the incomplete data and fit the model to it
out.glm=imi.glm (data.miss,family=binomial(link='logit'),M0='manual',max.M=500,epsilon=0.05,
                  method='mvn',resp,regressors, conv.plot=TRUE, dis.method='mahalanobis',
                  mah.scale='within',successive.valid='manual',max.iter.glm=1000)
-- Imputation 1 --

  1  2  3  4

-- Imputation 2 --

  1  2  3  4  5  6

[1] "The time it takes (in seconds) to imput the data two times and fit the model to them is: 0.04"
What is your choice of initial number of imputations?2
What is your choice for successive steps validation?3
-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

> names(out.glm)
[1] "mi.param"    "data.imp"    "dis.steps"   "conv.status" "M"