seqimpute: Imputation of missing data in sequence analysis

View source: R/seqimpute.R

seqimputeR Documentation

Imputation of missing data in sequence analysis

Description

The seqimpute package implements the MICT and MICT-timing methods. These are multiple imputation methods for longitudinal data. The core idea of the algorithms is to fills gaps of missing data, which is the typcial for of missing data in a longitudinal setting, recursively from their edges. The prediction is based on either a multinomial or a random forest regression model. Covariates and time-dependant covariates can be included in the model. The prediction of the missing values is based on the theory of Prof. Brendan Halpin. It considers a various amount of surrounding available information to perform the prediction process. In fact, we can among others specify np (the number of past variables taken into account) and nf (the number of future information taken into account).

Usage

seqimpute(
  OD,
  np = 1,
  nf = 1,
  m = 1,
  timing = FALSE,
  timeFrame = 0,
  covariates = matrix(NA, nrow = 1, ncol = 1),
  time.covariates = matrix(NA, nrow = 1, ncol = 1),
  regr = "multinom",
  nfi = 1,
  npt = 1,
  available = TRUE,
  pastDistrib = FALSE,
  futureDistrib = FALSE,
  noise = 0,
  ParExec = FALSE,
  ncores = NULL,
  SetRNGSeed = FALSE,
  verbose = TRUE,
  ...
)

Arguments

OD

either a data frame containing sequences of a multinomial variable with missing data (coded as NA) or a state sequence object built with the TraMineR package

np

number of previous observations in the imputation model of the internal gaps.

nf

number of future observations in the imputation model of the internal gaps.

m

number of multiple imputations (default: 1).

timing

a logical value that specifies if the standard MICT algorithm (timing=FALSE) or the MICT-timing algorithm (timing=TRUE) should be used.

timeFrame

parameter relative to the MICT-timing algorithm specifying the radius of the timeFrame.

covariates

a data frame containing the covariates intended for use in the imputation process, with each column representing a distinct covariate.

time.covariates

a data frame object containing some time-dependent covariates that help specifying the predictive model more accurately.

regr

a character specifying the imputation method. If regr="multinom", multinomial models are used, while if regr="rf", random forest models are used.

nfi

number of future observations in the imputation model of the initial gaps.

npt

number of previous observations in the imputation model of the terminal gaps.

available

a logical value allowing the user to choose whether to consider the already imputed data in the predictive model (available = TRUE) or not (available = FALSE).

pastDistrib

a logical indicating if the past distribution should be used as predictor in the imputation model.

futureDistrib

a logical indicating if the futur distribution should be used as predictor in the imputation model.

noise

numeric object adding a noise on the predicted variable pred determined by the multinomial model (by introducing a variance noise for each components of the vector pred) (the user can choose any value for noise, but we recommend to choose a rather relatively small value situated in the interval [0.005-0.03]).

ParExec

logical. If TRUE, the multiple imputations are run in parallell. This allows faster run time depending of how many core the processor has.

ncores

integer. Number of cores to be used for the parallel computation. If no value is set for this parameter, the number of cores will be set to the maximum number of CPU cores minus 1.

SetRNGSeed

an integer that is used to set the seed in the case of parallel computation. Note that setting set.seed() alone before the seqimpute function won't work in case of parallel computation.

verbose

logical. If TRUE, seqimpute will print history and warnings on console. Use verbose=FALSE for silent computation.

...

Named arguments that are passed down to the imputation functions.

mice.return

a logical indicating whether an object of class mids, that can be directly used by the mice package, should be returned by the algorithm. By default, a data frame with the imputed datasets stacked vertically is returned.

include

logical. If a dataframe is returned (mice.return = FALSE), indicates if the original dataset should be included or not. This parameter does not apply if mice.return=TRUE.

Details

The imputation process is divided into several steps. According to the location of the gaps of NA among the original dataset, we have defined 5 types of gaps:

- Internal Gaps (simple usual gaps)

- Initial Gaps (gaps situated at the very beginning of a sequence)

- Terminal Gaps (gaps situated at the very end of a sequence)

- Left-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval [0,np] but the ending location is not included in the interval [ncol(OD)-nf,ncol(OD)])

- Right-hand side SLG (Specially Located Gaps) (gaps of which the ending location is included in the interval [ncol(OD)-nf,ncol(OD)] but the beginning location is not included in the interval [0,np])

- Both-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval [0,np] and the ending location is included in the interval [ncol(OD)-nf,ncol(OD)] )

Order of imputation of the gaps types: 1. Internal Gaps 2. Initial Gaps 3. Terminal Gaps 4. Left-hand side SLG 5. Right-hand side SLG 6. Both-hand side SLG

Value

Returns either an S3 object of class mids if mice.return = TRUE or a dataframe, where the imputed dataset are stacked vertically. In the second case, two columns are added: .imp integer that refers to the imputation number (0 corresponding to the original dataset if include=TRUE) and .id character corresponding to the rownames of the dataset to impute.

Author(s)

Andre Berchtold <andre.berchtold@unil.ch> Kevin Emery Anthony Guinchard Kamyar Taher

References

HALPIN, Brendan (2012). Multiple imputation for life-course sequence data. Working Paper WP2012-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3639.

HALPIN, Brendan (2013). Imputing sequence data: Extensions to initial and terminal gaps, Stata's. Working Paper WP2013-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3620

Examples


# Default single imputation
RESULT <- seqimpute(OD = OD, np = 1, nf = 1, nfi = 1, npt = 1, m = 1)

# Seqimpute used with parallelisation
## Not run: 
RESULT <- seqimpute(OD = OD, np = 1, nf = 1, nfi = 1, npt = 1, m = 2, ParExec = TRUE, SetRNGSeed = 17, ncores = 2)

## End(Not run)


seqimpute documentation built on March 19, 2024, 3:09 a.m.