convData: Prepare data for imputations with BMLCimpute (convData)

Description Usage Arguments Details Value Functions Author(s) Examples

Description

This function takes a categorical dataset as input (categories can be denoted by numbers) and returns a list of objects that will be used by the 'multilevelLCMI' function to perform the imputations.

A package for the multiple imputation of single-level and nested categorical data by means of Bayesian Multilevel Latent Class models.

Usage

1
convData(dat, GID = NULL, UID = NULL, var2 = NULL)

Arguments

dat

Raw (categorical) data frame with missing data. It can also be a data matrix. The GID and UID arguments, if passed to the function, must be in the first two columns of the dataset.

GID

Group (level-2 unit) indicator (expressed as column number corresponding to the group ID in the dataset). It can be omitted in single-level datasets.

UID

Lower-level unit indicator (expressed as column number corresponding to the unit ID in the dataset). Optional.

var2

Higher-level (group-specific) variables (expressed as a vector of column numbers in the dataset corresponding to the variables measured at the higher levels). Optional.

Details

Convert a raw categorical dataset with missing data into one ready to be imputed with the multilevelLCMI function. In particular, the function will transform factor variables into numeric ones, where numbers denote a different category. A coding list is returned along with the converted dataset.

'BMLCimpute' allows researchers and users of categorical datasets with missing data to perform Multiple Imputation via Bayesian latent class models. Data can be either single- or multi-level. Model estimation and imputations are implemented via a Gibbs sampler run with the Rcpp package interface. The function multilevelLCMI performs the imputations. Prior to the imputation step, data should be processed with the function convData; the resulting list is then passed as input to the multilevelLCMI. Complete datasets are obtained via the compData function.

Value

A convData object, a list containing the following items:

convDat

The converted dataset

codLev1

List containing the new (and original) scores which will be used for the imputations (Level-1 variables).

codLev1

Vector containing the number of categories observed for each variable (Level-1 variables).

nCatLev1

Vector containing the number of categories observed for each variable (Level-1 variables).

codLev2

List containing the new (and original) scores which will be used for the imputations (Level-2 variables).

nCatLev2

List containing the new (and original) scores which will be used for the imputations (Level-2 variables).

GroupIDs

Matrix containing original and new Group ID's.

GID

The column Group ID number (as entered by the user).

UID

The column Unit ID number (as entered by the user).

var2

The column numbers for level-2 variables (as entered in the input).

doVar2

Boolean. Shall the BMLC model impute variables at level-2? (Result of !is.null(var2)).

namesLev1

Vector of variable names (level-1 variables).

namesLev2

Vector of variable names (level-2 variables).

GroupName

Group ID variable name.

CaseName

Unit ID variable name.

caseID

Unit ID vector (re-permuted).

sort_

Vector containing the original permutation of the dataset rows.

Functions

Author(s)

D. Vidotto <d.vidotto@uvt.nl>

BMLCimpute : Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: 

library(BMLCimpute)

# Load data 
data(simul_incomplete)

# Preprocess the Data
cd <- convData(simul_incomplete, GID = 1, UID = 2, var2 = 8:12)

# Model Selection
set.seed(1)
mmLC <- multilevelLCMI( convData = cd, L = 10, K = 10, it1 = 1000, it2 = 3000, it3 = 100,
    it.print = 250, v = 10, I = 0, pri2 = 1 / 10, pri1 = 1 / 15, priresp =  0.01,
    priresp2 = 0.01, random = TRUE, estimates = FALSE, count = TRUE, plot.loglik = FALSE,
    prec = 3, scale = 1.0)

# Select posterior maxima of the number of classes for the imputations 
# (Other alternatives are possible, such as posterior modes or posterior quantiles)
L = max(which(mmLC[[12]] != 0))
K = max(apply(mmLC[[13]], 1, function(x) max(which( x != 0))), na.rm = TRUE)

# Perform 5 imutations on the dataset 
mmLC <- multilevelLCMI( convData = cd, L = L, K = K, it1 = 2000, it2 = 4000, it3 = 100, 
     it.print = 250, v = 10, I = 5, pri2 = 500, pri1 = 50, priresp = 0.01, priresp2 = 0.01,
     random = TRUE, estimates = FALSE, count = TRUE, plot.loglik = TRUE, prec = 4, scale = 1.0)

# Obtain the dataset completed with the first set of imputations (ind = 1)
complete_data = compData( convData = cd, implev1 = mmLC[[1]], implev2 = mmLC[[2]], ind = 1 )


## End(Not run) 

davidevdt/BMLCimpute documentation built on June 5, 2019, 12:36 a.m.