A fast and general method for building augmented data
A fast and general method for reshaping standard longitudinal data into a new structure called
'augmented'. This format is suitable under a multi-state framework using the
1 2 3
A keying variable which
An integer variable indicating the progressive (monotonic) event number
of a given ID.
Either an integer, a factor or a characer with 2 or 3 unique values which
provides the ID status at the end of the study.
A list of three and exactly three possible states which a subject can reach.
The starting time of an observation. It can be passed as date, integer, or numeric format.
The ending time of an observation. It can be passed as date, integer, or numeric format.
The censoring time of the study. This is the date until each ID is observed, if still active in the cohort.
The exact death time of a subject ID. If
A variable indicating the name of the new time variable of the process in the
augmented format. If
A variable which marks further transitions beside the default ones given by
In order to get the data processed, a monotonic increasing process needs to be ensured.
In the first place,
augment checks this both in case
n_events is missing or not. The data are
fastly ordered through
setkey function with
the primary key and
t_start as the secondary key. In the second place, it checks
the monotonicity of
n_events and if it fails, it stops with error and returns the subjects
data_key for whom the condition is not met. If
n_events is missing, then
augment internally computes the progression number with the name n_events and runs
the same procedure.
Attention needs to be payed to argument
pattern. Integer values can be 0 and 1 if only two
status are defined and they must correspond to the status 'alive' and 'dead'. If three values are
defined, then they must be 0, 1 and 2 if
pattern is an integer, or 'alive', 'dead inside a
transition' and dead outside a transition' if
pattern is either a character or a factor.
The order matters: it is not possible to specify 0 as 'dead' for instance.
When passing a list of states, the order is important so that the first element must be the state corresponding to the starting time (i.e. 'IN', inside the hospital), the second element must correspond to the ending time (i.e. 'OUT', outside the hospital), and the third state is the absorbing state (i.e. 'DEAD').
more_status allows to manage multiple transitions beside what already specified in
In particular, if the corresponding observation is a standard admission which adds no other
information than what is inside
more_status must be set to 'df' which
stands for 'Default' (see 'Examples' or run ?hosp and look at the variable 'rehab_it'). In general,
it is always a good practice to fully specify the transition with a bunch of self-explanatory characters
in order to quickly understand which is the current transition.
An augmented format dataset of class
TRUE, where each row represents a specific transition for a given subject.
augment returns them after some important variables have been computed:
augmented: the new timing variable for the process when looking at transitions. If
t_augmented is missing, then
augment creates augmented by default.
augmented. The function looks directly to
t_end to build
it and thus it inherits their class.
In particular, if
t_start is a date format, then
augment computes a new variable
cast as integer and names it augmented_int. If
t_start is a difftime format,
augment computes a new variable cast as a numeric and names it augmented_num;
status: a status flag which contains the states as specified in
augment automatically checks whether argument
pattern has 2 or 3 unique values and
computes the correct structure of a given subject as reported in the vignette.
The variable is cast as character;
status_num: the corresponding integer version of status;
n_status: a mix of status and
n_events cast as character. This becomes
useful when a multi-state model on the progression of the process needs to be implemented.
more_status is passed, then
augment computes some more variables. They mimic the
meaning of status, status_num, and n_status but they account for the more
complex structure defined. They are:
Francesco Grossetti firstname.lastname@example.org.
Jackson, C.H. (2011). Multi-State Models for Panel Data:
The msm Package for R. Journal of Statistical Software, 38(8), 1-29.
M. Dowle, A. Srinivasan, T. Short, S. Lianoglou with contributions from R. Saporta and E. Antonyan (2016):
data.table: Extension of data.frame. R package version 1.9.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# 1. # loading data data( hosp ) # augmenting hosp hosp_augmented = augment( data = hosp, data_key = subj, n_events = adm_number, pattern = label_3, t_start = dateIN, t_end = dateOUT, t_cens = dateCENS ) # 2. # augmenting hosp by passing more information regarding transition with arg. more_status hosp_augmented_more = augment( data = hosp, data_key = subj, n_events = adm_number, pattern = label_3, t_start = dateIN, t_end = dateOUT, t_cens = dateCENS, more_status = rehab_it ) ## Not run: augmented = augment( data = hosp, data_key = subj, n_events = dateIN, pattern = label_3, t_start = dateIN, t_end = dateOUT, t_cens = dateCENS ) ## End(Not run)