polish: Remove observations with different states occurring at the...
In msmtools: Building Augmented Data to Run Multi-State Models with 'msm' Package

Description Usage Arguments Details Author(s) See Also Examples

View source: R/polish.R

Fast algorithm to get rid of transitions to different states occurring at the same exact time in an augmented data structure as computed by augment (see 'Details').

polish(
  data,
  data_key,
  pattern,
  time,
  check_NA = FALSE,
  convert = FALSE,
  verbose = TRUE
)

`data`	A `data.table` or `data.frame` object in longitudinal format where each row represents an observation in which the exact starting and ending time of the process are known and recorded. If `data` is a `data.frame`, then `augment` internally casts it to a `data.table`.
`data_key`	A keying variable which `augment` uses to define a key for `data`. This represents the subject ID (see `setkey`).
`pattern`	Either an integer, a factor or a character with 2 or 3 unique values which provides the ID status at the end of the study. `pattern` has a predefined structure. When 2 values are detected, they must be in the format: 0 = "alive", 1 = "dead". When 3 values are detected, then the format must be: 0 = "alive", 1 = "dead during a transition", 2 = "dead after a transition has ended" (see 'Details').
`time`	The target time variable to check duplicates. By default it is set to 'augmented_int'.
`check_NA`	If `TRUE`, then arguments `data_key`, `pattern`, and `time` are looked up for any missing data and if the function finds any, it stops with error. Default is `FALSE`.
`convert`	If `TRUE`, then the returned object is automatically converted to the class `data.frame`. This is done in place and comes at very low cost both from running time and memory consumption (see `setDF`).
`verbose`	If `FALSE`, all information produced by `print`, `cat` and `message` are suppressed. Default is `TRUE`.

The function finds all those cases where two subsequent events for a given subject land on different states but occur at the same time. When this happens, the whole subject, as identified by data_key, is removed from the data. The total number of subjects to be removed is printed out in order to be more informative.

Francesco Grossetti francesco.grossetti@unibocconi.it.

augment

# loading data
data( hosp )

# augmenting longitudinal data
hosp_aug = augment( data = hosp, data_key = subj, n_events = adm_number,
                    pattern = label_3, t_start = dateIN, t_end = dateOUT,
                    t_cens = dateCENS )

# cleaning any targeted occurrence
hosp_aug_clean = polish( data = hosp_aug, data_key = subj, pattern = label_3 )