seqimpute: seqimpute: Imputation of missing data in longitudinal...

View source: R/seqimpute.R

seqimputeR Documentation

seqimpute: Imputation of missing data in longitudinal categorical data

Description

The seqimpute package implements the MICT and MICT-timing methods. These are multiple imputation methods for longitudinal data. The core idea of the algorithms is to fills gaps of missing data, which is the typical form of missing data in a longitudinal setting, recursively from their edges. The prediction is based on either a multinomial or a random forest regression model. Covariates and time-dependent covariates can be included in the model.

The MICT-timing algorithm is an extension of the MICT algorithm designed to address a key limitation of the latter: its assumption that position in the trajectory is irrelevant.

Usage

seqimpute(
  data,
  var = NULL,
  np = 1,
  nf = 1,
  m = 5,
  timing = FALSE,
  frame.radius = 0,
  covariates = NULL,
  time.covariates = NULL,
  regr = "multinom",
  npt = 1,
  nfi = 1,
  ParExec = FALSE,
  ncores = NULL,
  SetRNGSeed = FALSE,
  verbose = TRUE,
  available = TRUE,
  pastDistrib = FALSE,
  futureDistrib = FALSE,
  ...
)

Arguments

data

a data frame containing sequences of a categorical variable with missing data (coded as NA)

var

the list of columns containing the trajectories. Default is NULL, i.e. all the columns.

np

number of previous observations in the imputation model of the internal gaps.

nf

number of future observations in the imputation model of the internal gaps.

m

number of multiple imputations (default: 5).

timing

a logical value that specifies if the MICT algorithm (timing=FALSE) or the MICT-timing algorithm (timing=TRUE) should be used.

frame.radius

parameter relative to the MICT-timing algorithm specifying the radius of the timeframe.

covariates

the list of columns containing the covariates to include in the imputation process

time.covariates

the list of columns containing the time-varying covariates to include in the imputation process

regr

a character specifying the imputation method. If regr="multinom", multinomial models are used, while if regr="rf", random forest models are used.

npt

number of previous observations in the imputation model of the terminal gaps.

nfi

number of future observations in the imputation model of the initial gaps.

ParExec

logical. If TRUE, the multiple imputations are run in parallel. This allows faster run time depending of how many cores the processor has.

ncores

integer. Number of cores to be used for the parallel computation. If no value is set for this parameter, the number of cores will be set to the maximum number of CPU cores minus 1.

SetRNGSeed

an integer that is used to set the seed in the case of parallel computation. Note that setting set.seed() alone before the seqimpute function won't work in case of parallel computation.

verbose

logical. If TRUE, seqimpute will print history and warnings on console. Use verbose=FALSE for silent computation.

available

a logical value allowing the user to choose whether to consider the already imputed data in the predictive model (available = TRUE) or not (available = FALSE).

pastDistrib

a logical indicating if the past distribution should be used as predictor in the imputation model.

futureDistrib

a logical indicating if the future distribution should be used as predictor in the imputation model.

...

Named arguments that are passed down to the imputation functions.

Details

The imputation process is divided into several steps, depending on the type of gaps of missing data. The order of imputation of the gaps are:

Internal gap:

there is at least np observations before an internal gap and nf after the gap

Initial gap:

gaps situated at the very beginning of a trajectory

Terminal gap:

gaps situated at the very end of a trajectory

Left-hand side specifically located gap (SLG):

gaps that have at least nf observations after the gap, but less than np observation before it

Right-hand side SLG:

gaps that have at least np observations before the gap, but less than nf observation after it

Both-hand side SLG:

gaps that have less than np observations before the gap, and less than nf observations after it

The primary difference between the MICT and MICT-timing algorithms lies in their approach to selecting patterns from other sequences for fitting the multinomial model. While the MICT algorithm considers all similar patterns regardless of their temporal placement, MICT-timing restricts pattern selection to those that are temporally closest to the missing value. This refinement ensures that the imputation process adequately accounts for temporal dynamics, resulting in more accurate imputed values.

Value

Returns an S3 object of class seqimp.

Author(s)

Kevin Emery <kevin.emery@unige.ch>, Andre Berchtold, Anthony Guinchard, and Kamyar Taher

References

HALPIN, Brendan (2012). Multiple imputation for life-course sequence data. Working Paper WP2012-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3639.

HALPIN, Brendan (2013). Imputing sequence data: Extensions to initial and terminal gaps, Stata's. Working Paper WP2013-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3620

Examples


# Default multiple imputation of the trajectories of game addiction with the
# MICT algorithm

## Not run: 
set.seed(5)
imp1 <- seqimpute(data = gameadd, var = 1:4)


# Default multiple imputation with the MICT-timing algorithm
set.seed(3)
imp2 <- seqimpute(data = gameadd, var = 1:4, timing = TRUE)


# Inclusion in the MICt-timing imputation process of the three background 
# characteristics (Gender, Age and Track), and the time-varying covariate 
# about gambling


set.seed(4)
imp3 <- seqimpute(data = gameadd, var = 1:4, covariates = 5:7, 
  time.covariates = 8:11)

  
# Parallel computation


imp4 <- seqimpute(data = gameadd, var = 1:4, covariates = 5:7, 
  time.covariates = 8:11, ParExec = TRUE, ncores=5, SetRNGSeed = 2)

## End(Not run)


seqimpute documentation built on May 29, 2024, 4:35 a.m.