imi.glm.more: imi.glm.more

Description Usage Arguments Value References Examples

Description

This function fits a generalized linear model for a given set of predictors and a response variable to an incomplete dataset using multiple imputation and determines the sufficient number of imputed datasets using iterative multiple imputation (imi) procedure when a set of already imputed datasets are available.

Usage

1
2
3
imi.glm.more(data.miss,data.imp0,family=binomial(link='logit'),max.M=500,epsilon,
             method='pmm',resp,regressors,conv.plot=TRUE,dis.method='mahalanobis',
             mah.scale='combined',successive.valid=3,max.iter.glm=1000)

Arguments

data.miss

A data frame with the variable in the model as its columns. Note that the missing values should be indicated by NA.

family

Indicating the error distribution and the link function (see documentation of function glm in R stats package).

data.imp0

A list with already imputed sets of data as its components.

max.M

The maximum number of iterations which the algorithm should terminate afterwards in case of non-covergence.

epsilon

The threshold for difference between two iterations.

method

Specifying string value 'mvn' would impute the data using a multivariate normal predictive model in R package amelia2, any other specification will impute the data using fully conditional specification approach in R package mice. One can see the method in documentation of function mice in R package mice. Specifying 'auto' will selected the predictive model based on the measurement level of each variable.

resp

A string value input with the name of the response variable. Note that this should match the name of one of the columns in the data.miss.

regressors

A vector of string values with the names of the predictors. Note that they should match the names of the variables in data.miss.

conv.plot

A logitical value, if TRUE then a convergence plot will be generated, if FALSE no plot will be provided.

dis.method

A string takes its values among 'euclidean', 'inf.norm', and 'mahalanobis' which specifies the distance measure between two iterations. Note that our suggestion is to use 'mahalanobis', other options are provided for research purposes.

mah.scale

A string takes its values among 'within', 'between', and 'combined' which specifies the scale matrix in Mahalanobis distance. Note that our suggestion is to use 'combined', other options are provided for research purposes.

successive.valid

An integer with minimum 1 which specifies the number of successive steps the stopping rule should be validated so the procedure could terminate.

max.iter.glm

The maximum number of iterations for the glm algorithm.

print.progress

A logical variable, if TRUE it prints the progress of imputation.

Value

mi.param

A list with the final MI-based estimated model parameters, their covariance matrix, as well as the within and between imputation covariance matrices.

data.imp

A list with imputed datasets as its components.

dis.steps

A vector with computed distance between iterations.

conv.status

If 1 then convergence is achieved, if 0 with max.M iterations, still the convergence could not be achieved.

M

The selected number of imputed datasets.

References

https://www.rdocumentation.org/packages/mice/versions/2.30/topics/mice

https://cran.r-project.org/web/packages/Amelia/

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# specifying sample size and number of predictors
sample.size=100
num.var.log=2
# creating a correlated set of predictors
x.orig=matrix(rnorm(num.var.log*sample.size),sample.size,num.var.log)
x=cbind(scale(x.orig,
              center=TRUE,scale=FALSE))
# creating the compound-symmetry structured covariance matrix
sigma2=4
tau=1
cov.mat=diag(sigma2,num.var.log)+tau
# making the data correlated
chol.cov=chol(cov.mat)
for (i in 1:sample.size){
  x[i,]=t(chol.cov)
}
x=t(t(x)+apply(x.orig,2,mean))
# specifying model parameters
beta=c( 0.2, -1.0,  0.5)
z=beta[1]+(x
# computing inverse logit transformation
pr = 1/(1+exp(-z))
# generating response variable
y = rbinom(n.samp,1,pr)
# creating complete the data
data = data.frame(y,x)
# specifying the regressors and predictors
resp='y'
regressors=c('X1','X2')
# creating missing values in the dataset
require('mice')
x.miss=ampute(x,prop=0.1,mech='MAR')$amp
data.miss=data.frame(y,x.miss)
# Determining number of imputations, impute the incomplete data and fit the model to it
out.glm=imi.glm (data.miss,family=binomial(link='logit'),M0='manual',max.M=500,epsilon=0.05,
                  method='mvn',resp,regressors, conv.plot=TRUE, dis.method='mahalanobis',
                  mah.scale='within',successive.valid='manual',max.iter.glm=1000)
-- Imputation 1 --

  1  2  3  4

-- Imputation 2 --

  1  2  3  4  5  6

[1] "The time it takes (in seconds) to imput the data two times and fit the model to them is: 0.04"
What is your choice of initial number of imputations?2
What is your choice for successive steps validation?3
-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3  4

-- Imputation 1 --

  1  2  3

> names(out.glm)
[1] "mi.param"    "data.imp"    "dis.steps"   "conv.status" "M"

vahidnassiri/imi documentation built on June 25, 2019, 5:50 a.m.