seqimpute | R Documentation |
The seqimpute package implements the MICT and MICT-timing methods. These are multiple imputation methods for
longitudinal data. The core idea of the algorithms is to fills gaps of missing data, which is the typcial for of
missing data in a longitudinal setting, recursively from their edges. The prediction is based
on either a multinomial or a random forest regression model.
Covariates and time-dependant covariates can be included in the model.
The prediction of the missing values is based on the theory of Prof. Brendan
Halpin. It considers a various amount of surrounding available information to
perform the prediction process.
In fact, we can among others specify np
(the number of past variables
taken into account) and nf
(the number of future information taken
into account).
seqimpute(
OD,
np = 1,
nf = 1,
m = 1,
timing = FALSE,
timeFrame = 0,
covariates = matrix(NA, nrow = 1, ncol = 1),
time.covariates = matrix(NA, nrow = 1, ncol = 1),
regr = "multinom",
nfi = 1,
npt = 1,
available = TRUE,
pastDistrib = FALSE,
futureDistrib = FALSE,
noise = 0,
ParExec = FALSE,
ncores = NULL,
SetRNGSeed = FALSE,
verbose = TRUE,
...
)
OD |
either a data frame containing sequences of a multinomial variable with missing data (coded as |
np |
number of previous observations in the imputation model of the internal gaps. |
nf |
number of future observations in the imputation model of the internal gaps. |
m |
number of multiple imputations (default: |
timing |
a logical value that specifies if the standard MICT algorithm (timing=FALSE) or the MICT-timing algorithm (timing=TRUE) should be used. |
timeFrame |
parameter relative to the MICT-timing algorithm specifying the radius of the timeFrame. |
covariates |
a data frame containing the covariates intended for use in the imputation process, with each column representing a distinct covariate. |
time.covariates |
a data frame object containing some time-dependent covariates that help specifying the predictive model more accurately. |
regr |
a character specifying the imputation method. If |
nfi |
number of future observations in the imputation model of the initial gaps. |
npt |
number of previous observations in the imputation model of the terminal gaps. |
available |
a logical value allowing the user to choose whether to consider the already imputed data in the predictive model ( |
pastDistrib |
a logical indicating if the past distribution should be used as predictor in the imputation model. |
futureDistrib |
a logical indicating if the futur distribution should be used as predictor in the imputation model. |
noise |
|
ParExec |
logical. If |
ncores |
integer. Number of cores to be used for the parallel computation. If no value is set for this parameter, the number of cores will be set to the maximum number of CPU cores minus 1. |
SetRNGSeed |
an integer that is used to set the seed in the case of parallel computation. Note that setting |
verbose |
logical. If |
... |
Named arguments that are passed down to the imputation functions. |
mice.return |
a logical indicating whether an object of class |
include |
logical. If a dataframe is returned ( |
The imputation process is divided into several steps. According to the location of the gaps of NA among the original dataset, we have defined 5 types of gaps:
- Internal Gaps (simple usual gaps)
- Initial Gaps (gaps situated at the very beginning of a sequence)
- Terminal Gaps (gaps situated at the very end of a sequence)
- Left-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval [0,np]
but the ending location is not included in the interval [ncol(OD)-nf,ncol(OD)]
)
- Right-hand side SLG (Specially Located Gaps) (gaps of which the ending location is included in the interval [ncol(OD)-nf,ncol(OD)]
but the beginning location is not included in the interval [0,np]
)
- Both-hand side SLG (Specially Located Gaps) (gaps of which the beginning location is included in the interval [0,np]
and the ending location is included in the interval [ncol(OD)-nf,ncol(OD)]
)
Order of imputation of the gaps types: 1. Internal Gaps 2. Initial Gaps 3. Terminal Gaps 4. Left-hand side SLG 5. Right-hand side SLG 6. Both-hand side SLG
Returns either an S3 object of class mids
if mice.return = TRUE
or a dataframe, where the imputed dataset are stacked vertically. In the second case,
two columns are added: .imp
integer that refers to the imputation number
(0 corresponding to the original dataset if include=TRUE
) and .id
character corresponding to
the rownames of the dataset to impute.
Andre Berchtold <andre.berchtold@unil.ch> Kevin Emery Anthony Guinchard Kamyar Taher
HALPIN, Brendan (2012). Multiple imputation for life-course sequence data. Working Paper WP2012-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3639.
HALPIN, Brendan (2013). Imputing sequence data: Extensions to initial and terminal gaps, Stata's. Working Paper WP2013-01, Department of Sociology, University of Limerick. http://hdl.handle.net/10344/3620
# Default single imputation
RESULT <- seqimpute(OD = OD, np = 1, nf = 1, nfi = 1, npt = 1, m = 1)
# Seqimpute used with parallelisation
## Not run:
RESULT <- seqimpute(OD = OD, np = 1, nf = 1, nfi = 1, npt = 1, m = 2, ParExec = TRUE, SetRNGSeed = 17, ncores = 2)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.