knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
set.seed(17)  # reproducible examples / runtimes

CRAN_Status_Badge Travis-CI Build Status Coverage Status DOI JOSS Status

multistateutils

multistateutils provides a number of useful functions for analyzing parametric multi-state models. In this way it does not aim to supplement the modelling strategies found in mstate, msm, or flexsurv, but rather provide tools for subsequent analysis. It is designed to be used with semi-Markov multi-state models of healthcare data, but can be used for any system that can be discretized into the states being entered at observed time-points, with parametric families providing appropriate fits for these transition rates.

It currently provides several features:

Examples of these features are provided below in Examples.

Installation

You can install the latest release version from CRAN with:

install.packages("multistateutils")

Or the development version can also be installed directly from GitHub.

install.packages("devtools")  # install devtools if it isn't already
devtools::install_github("stulacy/multistateutils", build_vignettes=TRUE)

Examples

This section provides a brief overview of the features in multistateutils, please see the Examples vignette for a thorough description.

Setup

In these examples we'll use the ebmt3 data set provided in the mstate package. It describes recovery after bone marrow transplant. We'll model it with an illness-death model, with transplant as the starting state, platelet recovery pr as the illness state, and relapse-free survival rfs as the death state.

library(mstate)
data(ebmt3)
tmat <- trans.illdeath()                             # Form transition matrix
long <- msprep(time=c(NA, 'prtime', 'rfstime'),      # Convert data to long
               status=c(NA, 'prstat', 'rfsstat'), 
               data=ebmt3, 
               trans=tmat, 
               keep=c('age', 'dissub'))
head(long)

multistateutils is designed for applied modelling with an overall aim of prediction, particularly in health-related areas. For this reason it is designed to work with parametric transition models built with flexsurv. The example below fits a Weibull model to each transition using the age and dissub covariates.

library(flexsurv)
models <- lapply(1:3, function(i) {
    flexsurvreg(Surv(time, status) ~ age + dissub, data=long, dist='weibull')
})

Estimating transition probabilities

Transition probabilities are defined as the probability of being in a state $j$ at a time $t$, given being in state $h$ at time $s$, as shown below where $X(t)$ gives the state an individual is in at $t$. This is all conditional on the individual parameterised by their covariates and history, which for this semi-Markov model only influences transition probabilities through state arrival times.

$$P_{h,j}(s, t) = \Pr(X(t) = j\ |\ X(s) = h)$$

We'll estimate the transition probabilities of an individual with the covariates age=20-40 and dissub=AML at 1 year after transplant.

newdata <- data.frame(age="20-40", dissub="AML")
library(multistateutils)
predict_transitions(models, newdata, tmat, times=365)

This functionality is already provided in flexsurv::pmatrix.simfs, but it is limited to assessing only one individual at a time, with $s=0$

pmatrix.simfs(models, tmat, newdata=newdata, t=365)

predict_transitions on the other hand can estimate transition probabilities for:

The ability to generate probabilities by varying $s$ and $t$ within a single simulation makes it very efficient for generating dynamic predictions.

predict_transitions(models, newdata, tmat, times=c(1, 2)*365, 
                    start_times = c(0.25, 0.75) * 365)

Length of stay

Similarly, the length of stay functionality provided by flexsurv::totlos.simfs has also been extended to allow for estimates at multiple time-points, states, and individuals to be calculated simultaneously. As shown below, the function parameters are very similar and the estimates are very close to those produced by totlos.simf.

length_of_stay(models, 
               newdata=newdata,
               tmat, times=365.25*3,
               start=1)
totlos.simfs(models, tmat, t=365.25*3, start=1, newdata=newdata)

Again, the advantage of this implementation is that estimates can be produced from multiple conditions without needing to rerun the simulation.

length_of_stay(models, 
               newdata=data.frame(age=c(">40", ">40"),
                                  dissub=c('CML', 'AML')),
               tmat, times=c(1, 3, 5)*365.25)

State flow diagram

Another feature in multistateutils is a visualization of a predicted pathway through the state transition model, calculated using dynamic prediction and provided in the function plot_predicted_pathway. It estimates state occupancy probabilities at discrete time-points and displays the flow between them in the manner of a Sankey diagram as shown below.

The output of this function is an interactive HTML widget that can be manipulated to layout the diagram to better suit your needs. Unfortunately widgets can't be embedded in GitHub READMEs so the image below is just a screenshot, but have a look at the version in the vignette for an working example.

$$P_{h,j}(s, t) = \Pr(X(t) = j\ |\ X(s) = h)$$

time_points <- seq(0, 10, by=2) * 365.25
plot_predicted_pathway(models, tmat, newdata, time_points, 'healthy')

Discrete event simulation

The underlying simulation engine can be easily adapted to run cohort-wide simulation, where the output statistics of interest are global rather individual level measures. This is useful for estimating healthcare related values, such as the average time spent receiving treatment for a given disease population.

These discrete event simulations are run with the cohort_simulation function as below. They simply require a list of models as specified above, a cohort population, and the transition matrix. It outputs a long table where each row is an observed state entry for a given individual.

sim <- cohort_simulation(models, ebmt3, tmat)
head(sim)

By default the simulation runs until every individual reaches a sink state, but a longitudinal specification can be provided, such that the individuals enter the simulation at set incident times and the simulation terminates after a predefined period. This functionality is useful when estimating statistics over a finite time-period, such as the expected number of deaths in a 5-year range, but it requires a model of the incidence process (i.e. how often a new patient is diagnosed). See the Examples vignette for further details.

msprep2

A large part of the work involved with multi-state modelling is basic munging of the data into a format suitable for modelling. The msprep function from the mstate package helps a lot here by converting a wide data frame where each row corresponds to a user with state entry times given in column, into a long format where each row represents a possible state transition.

It does, however, have some slight limitations:

For these reasons I've created a modified version of msprep, imaginatively called msprep2, that performs the same role but accepts the input data as a tidy table of state entry times rather than a wide table at the individual level.

In this format, each row in the input data frame corresponds to a known state entry time, and so non-visited states are simply omitted. The input data frame only needs 3 columns: an individual id, a state id, and the time of entry. Covariates and censoring information are provided by separate data frames that link back on the id, providing a cleaner interface and one that works well with data that is stored in relational databases.

The example below shows the data preparation for 2 individuals:

  1. Enters state 2 at $t=23$ then no more transitions until last follow-up at $t=744$
  2. Enters state 2 at $t=35$ then enters the absorptive state 3 at $t=360$
entry <- data.frame(id=c(1, 2, 2),
                    state=c(2, 2, 3),
                    time=c(23, 35, 360))
cens <- data.frame(id=1, censor_time=744)
covars <- data.frame(id=1:2, age=c('>40', '20-40'), dissub=c('CML', 'AML'))
msprep2(entry, tmat, censors = cens, covars = covars)

By specifying state entry as a long table there is now no limit to how many times a state can be entered by an individual. Let's demonstrate this by extending the illness-death model to allow a patient to recover (i.e. transition from illness back to healthy).

states <- c('healthy', 'illness', 'death')
tmat2 <- matrix(c(NA, 3, NA, 1, NA, NA, 2, 4, NA), nrow=3, ncol=3, 
                dimnames=list(states, states))
tmat2

I'll create a dummy dataset with one individual moving from healthy->illness->death at times 6 and 11, while patient 2 goes from healthy->illness, then is cured and goes back to healthy, before moving back to illness and finally dying.

multistate_entry <- data.frame(id=c(rep(1, 2),
                                    rep(2, 4)),
                               state=c('illness', 'death',
                                       'illness', 'healthy', 'illness', 'death'),
                               time=c(6, 11,
                                      7, 12, 17, 22))
multistate_entry

As seen below, msprep2 has no problem converting this into a list of possible transitions. Note that we don't need to pass in anything to censors because we have complete follow-up on both patients.

msprep2(multistate_entry, tmat2)

Upcoming features

There is currently a web-app (not publicly accessible but the source code is on Github) that provides a graphical interface for the entire multi-state modelling process and simulation process for a cohort simulation. I'd like to tidy this up and get it functioning with this new version of multistateutils and also provide an interface for individual level simulations, such as estimating transition probabilities.



stulacy/RDES documentation built on Feb. 14, 2021, 2:31 a.m.