jomoImpute: Impute single-level and multilevel missing data using 'jomo'

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/jomoImpute.R

Description

Performs single- and multilevel imputation for (mixed) continuous and categorical data using the jomo package Supports imputation of missing data at level 1 and 2 as well as imputation using random (residual) covariance matrices. See 'Details' for further information.

Usage

1
2
3
4
jomoImpute(data, type, formula, random.L1 = c("none", "mean", "full"),
  n.burn = 5000, n.iter = 100, m = 10, group = NULL, prior = NULL,
  seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"),
  silent = FALSE)

Arguments

data

A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets.

type

An integer vector specifying the role of each variable in the imputation model or a list of two vectors specifying a two-level model (see 'Details').

formula

A formula specifying the role of each variable in the imputation model or a list of two formulas specifying a two-level model. The basic model is constructed by model.matrix, which allows including derived variables in the imputation model using I() (see 'Details' and 'Examples').

random.L1

A character string denoting if the covariance matrix of residuals should vary across groups and how the values of these matrices are stored (see 'Details'). Can be "none" (common covariance matrix), "mean" (random covariance matrix, storing only mean values), or "full" (random covariance matrix, storing all values). Default is "none".

n.burn

The number of burn-in iterations before any imputations are drawn. Default is 5,000.

n.iter

The number of iterations between imputations. Default is 100.

m

The number of imputed data sets to generate. Default is 10.

group

(optional) A character string denoting the name of an additional grouping variable to be used with the formula argument. If specified, the imputation is run separately within each of these groups.

prior

(optional) A list with components Binv, Dinv, and a for specifying prior distributions for the covariance matrix of random effects and the covariance matrix of residuals (see 'Details'). Default is to use least-informative priors.

seed

(optional) An integer value initializing R's random number generator for reproducible results. Default is to use the global seed.

save.pred

(optional) Logical flag indicating if variables derived using formula should be included in the imputed data sets. Default is FALSE.

keep.chains

(optional) A character string denoting which chains of the MCMC algorithm to store. Can be "full" (stores chains for all parameters) or "diagonal" (stores chains for fixed effects and diagonal entries of the covariance matrices). Default is "full" (see 'Details').

silent

(optional) Logical flag indicating if console output should be suppressed. Default is FALSE.

Details

This function serves as an interface to the jomo package and supports imputation of single-level and multilevel continuous and categorical data at both level 1 and 2 (see Carpenter & Kenward, 2013; Goldstein et al., 2009). In order for categorical variables to be detected correctly, these must be formatted as a factor variables (see 'Examples'). The imputation model can be specified using either the type or the formula argument.

The type interface is designed to provide quick-and-easy imputations using jomo. The type argument must be an integer vector denoting the role of each variable in the imputation model:

At least one target variable and, for multilevel imputation, the cluster indicator must be specified. If the cluster indicator is omitted, single-level imputation will be performed. The intercept is automatically included as both a fixed and (for multilevel models) a random effect. If a variable of type -1 is found, then separate imputations are performed within each level of that variable.

The formula argument is intended as a more flexible and feature-rich interface to jomo. Specifying the formula argument is similar to specifying other formulae in R. Given below is a list of operators that jomoImpute currently understands:

If the cluster indicator is omitted, single-level imputation will be performed. For multilevel imputation, predictors are allowed to have fixed effects, random effects, or both on all target variables. The intercept is automatically included as both a fixed and (for multilevel models) a random effect. Both can be suppressed if needed (see panImpute). Note that, when specifying random effects other than the intercept, these will not be automatically added as fixed effects and must be included explicitly. Any predictors defined by I() will be used for imputation but not included in the data set unless save.pred = TRUE.

If missing data occur at both level 1 and 2, the imputation model is specified as a list of two formulas or types, respectively. The first element of this list denotes the model specification for variables at level 1. The second element denotes the model specification for variables at level 2. Missing data are imputed jointly at both levels (see 'Examples', see also Carpenter and Kenward, 2013; Goldstein et al., 2009).

It is possible to model the covariance matrix of residuals at level 1 as random across clusters (Yucel, 2011; Carpenter & Kenward, 2013). The random.L1 argument determines this behavior and how the values of these matrices are stored. If set to "none", a common covariance matrix is assumed across groups (similar to panImpute). If set to "mean", the covariance matrices are random, but only the average covariance matrix is stored at each iteration. If set to "full", the covariance matrices are random, and all variances and covariances from all clusters are stored.

In order to run separate imputations for each level of an additional grouping variable, the group argument can be used. The name of the grouping variable must be given as a character string (i.e., in quotation marks).

The default prior distribution for the covariance matrices in jomoImpute are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale. The prior argument can be used to specify alternative prior distributions. These must be supplied as a list containing the following components:

Note that jomo does not allow for the degrees of freedom for the inverse-Wishart prior to be specified by the user. These are always set to the lowest value possible (largest dispersion) or determined iteratively if the residuals at level 1 are modeled as random (see above). For single-level imputation, only Binv is relevant.

In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the keep.chains argument (see also panImpute). This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution.

Value

An object of class mitml, containing the following components:

data

The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original order ("sort"), grouping ("group") and factor levels of categorical variables.

replacement.mat

A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set.

index.mat

A matrix containing the row and column index for each missing value. The index matrix is used to link the missing values in the data set with their corresponding rows in the replacement matrix.

call

The matched function call.

model

A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, at level 1 and level 2, respectively.

random.L1

A character string denoting the handling of the (random) covariance matrix of residuals at level 1 (see 'Details').

prior

The prior parameters used in the imputation model.

iter

A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets.

par.burnin

A multi-dimensional array containing the parameters of the imputation model from the burn-in phase.

par.imputation

A multi-dimensional array containing the parameters of the imputation model from the imputation phase.

Note

For objects of class mitml, methods for the generic functions print, summary, and plot are available to inspect the fitted imputation model. mitmlComplete is used for extracting the imputed data sets.

Author(s)

Simon Grund, Alexander Robitzsch, Oliver Luedtke

References

Carpenter, J. R., & Kenward, M. G. (2013). Multiple imputation and its application. Hoboken, NJ: Wiley.

Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9, 173-197.

Yucel, R. M. (2011). Random covariances and mixed-effects models for imputing multivariate multilevel continuous data. Statistical Modelling, 11, 351-370.

See Also

panImpute, mitmlComplete, summary.mitml, plot.mitml

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# NOTE: The number of iterations in these examples is much lower than it
# should be. This is done in order to comply with CRAN policies, and more
# iterations are recommended for applications in practice!

data(studentratings)
data(leadership)

# ***
# for further examples, see "panImpute"
#

?panImpute

# *** ................................
# the 'type' interface
# 

# * Example 1.1 (studentratings): 'ReadDis' and 'SES', predicted by 'ReadAchiev'
# (random slope)

type <- c(-2, 0, 0, 0, 0, 1, 3, 1, 0, 0)
names(type) <- colnames(studentratings)
type

imp <- jomoImpute(studentratings, type = type, n.burn = 100, n.iter = 10, m = 5)

# * Example 1.2 (leadership): all variables (mixed continuous and categorical
# data with missing values at level 1 and level 2)

type.L1 <- c(-2, 1, 0, 1, 1)   # imputation model at level 1
type.L2 <- c(-2, 0, 1, 0, 0)   # imputation model at level 2
names(type.L1) <- names(type.L2) <- colnames(leadership)

type <- list(type.L1, type.L2)
type

imp <- jomoImpute(leadership, type = type, n.burn = 100, n.iter = 10, m = 5)

# * Example 1.3 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted
# with empty model, groupwise for 'FedState' (single-level imputation)

type <- c(0, -1, 0, 0, 0, 1, 1, 1, 0, 0)
names(type) <- colnames(studentratings)
type

imp <- jomoImpute(studentratings, type = type, group = "FedState", n.burn = 100,
                  n.iter = 10, m = 5)

# *** ................................
# the 'formula' interface
# 

# * Example 2.1 (studentratings): 'ReadDis' and 'SES' predicted by 'ReadAchiev'
# (random slope)

fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5)

# * Example 2.2 (studentratings): 'ReadDis' predicted by 'ReadAchiev' and the
# the cluster mean of 'ReadAchiev'

fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID)
imp <- jomoImpute(studentratings, formula = fml, n.burn = 100, n.iter = 10, m = 5)

# * Example 2.3 (studentratings): 'ReadDis' predicted by 'ReadAchiev', groupwise
# for 'FedState'

fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100,
                  n.iter = 10, m = 5)

# * Example 2.4 (leadership): all variables (mixed continuous and categorical
# data with missing values at level 1 and level 2)

fml <- list( JOBSAT + NEGLEAD + WLOAD ~ 1 + (1|GRPID) , COHES ~ 1 )
imp <- jomoImpute(leadership, formula = fml, n.burn = 100, n.iter = 10, m = 5)

# * Example 2.5 (studentratings): 'ReadDis', 'ReadAchiev', and 'SES' predicted
# with empty model, groupwise for 'FedState' (single-level imputation)

fml <- ReadDis + ReadAchiev + SES ~ 1
imp <- jomoImpute(studentratings, formula = fml, group = "FedState", n.burn = 100,
                  n.iter = 10, m = 5)

Example output

*** This is beta software. Please report any bugs!
*** See the NEWS file for recent changes.
panImpute                package:mitml                 R Documentation

_I_m_p_u_t_e _m_u_l_t_i_l_e_v_e_l _m_i_s_s_i_n_g _d_a_t_a _u_s_i_n_g '_p_a_n'

_D_e_s_c_r_i_p_t_i_o_n:

     This function provides an interface to thepanpackage for
     multiple imputation of multilevel data (Schafer & Yucel, 2002).
     Imputations can be generated usingtypeorformula, which
     offer different options for model specification.

_U_s_a_g_e:

     panImpute(data, type, formula, n.burn=5000, n.iter=100, m=10, group=NULL, 
       prior=NULL, seed=NULL, save.pred=FALSE, keep.chains=c("full","diagonal"),
       silent=FALSE)
     
_A_r_g_u_m_e_n_t_s:

    data: A data frame containing incomplete and auxiliary variables,
          the cluster indicator variable, and any other variables that
          should be present in the imputed datasets.

    type: An integer vector specifying the role of each variable in the
          imputation model (see details).

 formula: A formula specifying the role of each variable in the
          imputation model. The basic model is constructed bymodel.matrix, thus allowing to include derived variables in
          the imputation model usingI()(see details and examples).

  n.burn: The number of burn-in iterations before any imputations are
          drawn. Default is to 5,000.

  n.iter: The number of iterations between imputations. Default is to
          100.

       m: The number of imputed data sets to generate.

   group: (optional) A character string denoting the name of an
          additional grouping variable to be used with theformulaargument. When specified, the imputation model is run
          separately within each of these groups.

   prior: (optional) A list with componentsa,Binv,c, andDinvfor specifying prior distributions for the covariance
          matrix of random effects and the covariance matrix of
          residuals (see details). Default is to using
          least-informative priors.

    seed: (optional) An integer value initializingpan's random
          number generator for reproducible results. Default is to
          using random seeds.

save.pred: (optional) Logical flag indicating if variables derived
          using ‘formula’ should be included in the imputed data sets.
          Default is to ‘FALSE’.

keep.chains: (optional) A character string denoting which parameter
          chains to save. Default is to save all chains (see details).

  silent: (optional) Logical flag indicating if console output should
          be suppressed. Default is to ‘FALSE’.

_D_e_t_a_i_l_s:

     This function serves as an interface to the ‘pan’ algorithm. The
     imputation model can be specified using either the ‘type’ or the
     ‘formula’ argument.

     The ‘type’ interface is designed to provide quick-and-easy
     imputations using ‘pan’. The ‘type’ argument must be an integer
     vector denoting the role of each variable in the imputation model:

        • ‘1’: target variables containing missing data

        • ‘2’: predictors with fixed effect on all targets (completely
          observed)

        • ‘3’: predictors with random effect on all targets (completely
          observed)

        • ‘-1’: grouping variable within which the imputation is run
          separately

        • ‘-2’: cluster indicator variable

        • ‘0’: variables not featured in the model

     At least one target variable and the cluster indicator must be
     specified. The intercept is automatically included both as a fixed
     and random effect. If a variable of type ‘-1’ is found, then
     separate imputations are performed within each level of that
     variable.

     The ‘formula’ argument is intended as more flexible and
     feature-rich interface to ‘pan’. Specifying the ‘formula’ argument
     is similar to specifying other formulae in R. Given below is a
     list of operators that ‘panImpute’ currently understands:

        • ‘~’: separates the target (left-hand) and predictor
          (right-hand) side of the model

        • ‘+’: adds target or predictor variables to the model

        • ‘*’: adds an interaction term of two or more predictors

        • ‘|’: denotes cluster-specific random effects and specifies
          the cluster indicator (e.g., ‘1|ID’)

        • ‘I()’: defines functions to be interpreted by ‘model.matrix’

     Predictors are allowed to have fixed effects, random effects, or
     both on all target variables. The intercept is automatically
     included both as a fixed and a random effect, but it can be
     constrained if necessary (see examples). Note that, when
     specifying random effects other than the intercept, these will
     _not_ be automatically added as fixed effects and must be included
     explicitly. Any predictors defined by ‘I()’ will be used for
     imputation but not included in the data set unless
     ‘save.pred=TRUE’.

     In order to run separate imputations for each level of an
     additional grouping variable, the ‘group’ argument may be used.
     The name of the grouping variable must be given in quotes.

     As a default prior, ‘panImpute’ uses "least informative"
     inverse-Wishart priors for the covariance matrix of random effects
     and the covariance matrix of residuals, that is, with minimum
     degrees of freedom (largest dispersion) and identity matrices for
     scale. For better control, the ‘prior’ argument may be used for
     specifying alternative prior distributions. These must be supplied
     as a list containing the following components:

        • ‘a’: degrees of freedom for the covariance matrix of
          residuals

        • ‘Binv’: scale matrix for the covariance matrix of residuals

        • ‘c’: degrees of freedom for the covariance matrix of random
          effects

        • ‘Dinv’: scale matrix for the covariance matrix of random
          effects

     A sensible choice for a diffuse non-default prior is to set the
     degrees of freedom to the lowest value possible, and the scale
     matrices according to a prior guess of the corresponding
     covariance matrices (see Schafer & Yucel, 2002).

     In imputation models with many parameters, the number of parameter
     chains being saved can be reduced with the ‘keep.chains’ argument.
     If set to ‘full’ (the default), all chains are saved. If set to
     ‘diagonal’, only chains pertaining to fixed effects and the
     diagonal entries of the covariance matrices are saved. This
     setting influences the storage mode of parameters (e.g.,
     dimensions and indices of arrays) and should be used with caution.

_V_a_l_u_e:

     Returns an object of class ‘mitml’, containing the following
     components:

    data: The original (incomplete) data set, sorted according to the
          cluster variable and (if given) the grouping variable, with
          several attributes describing the original row order
          (‘"sort"’) and grouping (‘"group"’.

replacement.mat: A matrix containing the multiple replacements (i.e.,
          imputations) for each missing value. The replacement matrix
          contains one row for each missing value and one one column
          for each imputed data set.

index.mat: A matrix containing the row and column index for each
          missing value. The index matrix is used to _link_ the missing
          values in the data set with their corresponding rows in the
          replacement matrix.

    call: The matched function call.

   model: A list containing the names of the cluster variable, the
          target variables, and the predictor variables with fixed and
          random effects, respectively.

random.L1: A character string denoting the handling of random residual
          covariance matrices (not used here; see ‘jomoImpute’).

   prior: The prior parameters used in the imputation model.

    iter: A list containing the number of burn-in iterations, the
          number of iterations between imputations, and the number of
          imputed data sets.

par.burnin: A multi-dimensional array containing the parameters of the
          imputation model from the burn-in phase.

par.imputation: A multi-dimensional array containing the parameters of
          the imputation model from the imputation phase.

_N_o_t_e:

     For objects of class ‘mitml’, methods for the generic functions
     ‘print’, ‘summary’, and ‘plot’ have been defined. ‘mitmlComplete’
     is used for extracting the imputed data sets.

_A_u_t_h_o_r(_s):

     Simon Grund, Alexander Robitzsch, Oliver Luedtke

_R_e_f_e_r_e_n_c_e_s:

     Schafer, J. L., and Yucel, R. M. (2002). Computational strategies
     for multivariate linear mixed-effects models with missing values.
     _Journal of Computational and Graphical Statistics, 11_, 437-457.

_S_e_e _A_l_s_o:

     ‘jomoImpute’, ‘mitmlComplete’, ‘summary.mitml’, ‘plot.mitml’

_E_x_a_m_p_l_e_s:

     # NOTE: The number of iterations in these examples is much lower than it
     # should be! This is done in order to comply with CRAN policies, and more
     # iterations are recommended for applications in practice!
     
     data(studentratings)
     
     # *** ................................
     # the 'type' interface
     # 
     
     # * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and 
     # 'CognAbility', with random slope for 'ReadAchiev'
     
     type <- c(-2,0,0,0,0,0,3,1,2,0)
     names(type) <- colnames(studentratings)
     type
     
     imp <- panImpute(studentratings, type=type, n.burn=1000, n.iter=100, m=5)
     
     # * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState',
     # and predicted by 'ReadAchiev'
     
     type <- c(-2,-1,0,0,0,0,2,1,0,0)
     names(type) <- colnames(studentratings)
     type
     
     imp <- panImpute(studentratings, type=type, n.burn=1000, n.iter=100, m=5)
     
     # *** ................................
     # the 'formula' interface
     # 
     
     # * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev'
     # (random intercept)
     
     fml <- ReadDis ~ ReadAchiev + (1|ID)
     imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
     
     # ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept)
     fml <- ReadDis ~ 0 + ReadAchiev + (1|ID)
     imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
     
     # * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev'
     # (random slope)
     
     fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID)
     imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
     
     # * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev',
     # groupwise for 'FedState'
     
     fml <- ReadDis ~ ReadAchiev + (1|ID)
     imp <- panImpute(studentratings, formula=fml, group="FedState", n.burn=1000,
     n.iter=100, m=5)
     
     # * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev'
     # including the cluster mean of 'ReadAchiev' as an additional predictor
     
     fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev,ID)) + (1|ID)
     imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
     
     # ... using 'save.pred' to save the calculated cluster means in the data set
     fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev,ID)) + (1|ID)
     imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5,
     save.pred=TRUE)
     
     head(mitmlComplete(imp,1))
     

         ID    FedState         Sex  MathAchiev     MathDis         SES 
         -2           0           0           0           0           1 
 ReadAchiev     ReadDis CognAbility  SchClimate 
          3           1           0           0 
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
[[1]]
  GRPID  JOBSAT   COHES NEGLEAD   WLOAD 
     -2       1       0       1       1 

[[2]]
  GRPID  JOBSAT   COHES NEGLEAD   WLOAD 
     -2       0       1       0       0 

Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
         ID    FedState         Sex  MathAchiev     MathDis         SES 
          0          -1           0           0           0           1 
 ReadAchiev     ReadDis CognAbility  SchClimate 
          1           1           0           0 
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
Warning message:
In jomoImpute(studentratings, type = type, group = "FedState", n.burn = 100,  :
  The 'group' argument is intended only for 'formula'. Setting 'type' of 'FedState' to '-1'.
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!
Running burn-in phase ...
Creating imputed data set ( 1 / 5 ) ...
Creating imputed data set ( 2 / 5 ) ...
Creating imputed data set ( 3 / 5 ) ...
Creating imputed data set ( 4 / 5 ) ...
Creating imputed data set ( 5 / 5 ) ...
Done!

mitml documentation built on Oct. 5, 2021, 5:07 p.m.