imp_cat_multi: The function for hierarchical imputation of categorical...

Description Usage Arguments Value

View source: R/hmi_imp_cat_multi.R

Description

The function is called by the wrapper and relies on MCMCglmm.
While in the single level function (imp_cat_single) we used regression trees to impute data, here we run a multilevel multinomial model. The basic idea is that for each category of the target variable (expect the reference category) an own formula is set up, saying for example that the chances to end up in category j increase with increasing X5. So there is an own regression coefficient beta_{5,j} present. In a multilevel setting, this regression coefficient beta_{5,j} might be different for different clusters: for cluster 27 it would be beta_{5,j,27} = beta_{5,j} + u_{5,27}. This also leads to own random effect covariance matrices for each category. All those random effect variance parameters can be collected in one (quite large) covariance matrix where (for example) not only the random intercepts variance and random slopes variance and their covariance is present. Instead, there is even a covariance between the random slopes in category s and the random intercepts in category p. Beside the difficulties in interpretation, these covariances have shown to be numerically instable so they are set to be 0.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
imp_cat_multi(
  y_imp,
  X_imp,
  Z_imp,
  clID,
  nitt = 22000,
  burnin = 2000,
  thin = 20,
  pvalue = 0.2,
  k = Inf
)

Arguments

y_imp

A Vector with the variable to impute.

X_imp

A data.frame with the fixed effects variables.

Z_imp

A data.frame with the random effects variables.

clID

A vector with the cluster ID.

nitt

An integer defining number of MCMC iterations (see MCMCglmm).

burnin

burnin A numeric value between 0 and 1 for the desired percentage of Gibbs samples that shall be regarded as burnin.

thin

An integer to set the thinning interval range. If thin = 1, every iteration of the Gibbs-sampling chain will be kept. For highly autocorrelated chains, that are only examined by few iterations (say less than 1000).

pvalue

A numeric between 0 and 1 denoting the threshold of p-values a variable in the imputation model should not exceed. If they do, they are excluded from the imputation model.

k

An integer defining the allowed maximum of levels in a factor covariate.

Value

A list with 1. 'y_ret' the n x 1 data.frame with the original and imputed values. 2. 'Sol' the Gibbs-samples for the fixed effects parameters. 3. 'VCV' the Gibbs-samples for variance parameters.


hmi documentation built on Oct. 23, 2020, 7:31 p.m.