Multistate Markov and hidden Markov models in continuous time
Description
Fit a continuoustime Markov or hidden Markov multistate model by maximum likelihood. Observations of the process can be made at arbitrary times, or the exact times of transition between states can be known. Covariates can be fitted to the Markov chain transition intensities or to the hidden Markov observation process.
Usage
1 2 3 4 5 6 7 8 9 10 11 12  msm ( formula, subject=NULL, data = list(), qmatrix, gen.inits = FALSE,
ematrix=NULL, hmodel=NULL, obstype=NULL, obstrue=NULL,
covariates = NULL, covinits = NULL, constraint = NULL,
misccovariates = NULL, misccovinits = NULL, miscconstraint = NULL,
hcovariates = NULL, hcovinits = NULL, hconstraint = NULL, hranges=NULL,
qconstraint=NULL, econstraint=NULL, initprobs = NULL,
est.initprobs=FALSE, initcovariates = NULL, initcovinits = NULL,
deathexact = NULL, death=NULL, exacttimes = FALSE, censor=NULL,
censor.states=NULL, pci=NULL, phase.states=NULL, phase.inits=NULL,
cl = 0.95, fixedpars = NULL, center=TRUE,
opt.method="optim", hessian=NULL, use.deriv=TRUE,
use.expm=TRUE, analyticp=TRUE, na.action=na.omit, ... )

Arguments
formula 
A formula giving the vectors containing the observed states and the corresponding observation times. For example,
Observed states should be numeric variables
in the set The times can indicate different types of observation scheme, so be
careful to choose the correct For hidden Markov models, 
subject 
Vector of subject identification numbers for the data
specified by 
data 
Optional data frame in which to interpret the variables
supplied in 
qmatrix 
Matrix which indicates the allowed transitions in
the continuoustime Markov chain, and optionally also the initial
values of those transitions. If an instantaneous transition is not allowed from state
r to state s, then If supplying initial values yourself, then the nonzero entries
should be those values. If using For example,
represents a 'health  disease  death' model, with initial transition intensities 0.1 from health to disease, 0.01 from health to death, 0.1 from disease to health, and 0.2 from disease to death. If the states represent ordered levels of severity of a disease,
then this matrix should usually only allow transitions between
adjacent states. For example, if someone was observed in state 1
("mild") at their first observation, followed by state 3 ("severe")
at their second observation, they are assumed to have passed through
state 2 ("moderate") in between, and the 1,3 entry of The initial intensities given here are with any covariates set to
their means in the data (or set to zero, if 
gen.inits 
If 
ematrix 
If misclassification between states is to be modelled, this should
be a matrix of initial values for the misclassification probabilities.
The rows represent underlying states, and the columns represent
observed states.
If an observation of state s is not possible when the subject
occupies underlying state r, then
represents a model in which misclassifications are only permitted between adjacent states. If any probabilities are constrained to be equal using
For an alternative way of specifying misclassification models, see

hmodel 
Specification of the hidden Markov model (HMM). This should be a
list of return values from HMM constructor functions. Each element
of the list corresponds to the outcome model conditionally on the
corresponding underlying state. Univariate constructors are
described in the For example, consider a threestate hidden Markov model. Suppose the observations in underlying state 1 are generated from a Normal distribution with mean 100 and standard deviation 16, while observations in underlying state 2 are Normal with mean 54 and standard deviation 18. Observations in state 3, representing death, are exactly observed, and coded as 999 in the data. This model is specified as
The mean and standard deviation parameters are estimated starting
from these initial values. If multiple parameters are constrained
to be equal using See the A misclassification model, that is, a hidden Markov model where
the outcomes are misclassified observations of the underlying
states, can either be specified using a list of For example,
is equivalent to

obstype 
A vector specifying the observation scheme for each row
of the data. This can be included in the data frame
If This is a generalisation of the

obstrue 
In misclassification models specified with
In HMMs specified with In HMMs where there are also censored states, 
covariates 
A formula or a list of formulae representing the covariates on the transition intensities via a loglinear model. If a single formula is supplied, like
then these covariates are assumed to apply to all intensities. If a named list is supplied, then this defines a potentially different model for each named intensity. For example,
specifies an age effect on the state 1  state 2 transition,
additive age and treatment effects on the state 2  state 3
transition, but no covariates on any other transitions that are
allowed by the If covariates are time dependent, they are assumed to be constant in between the times they are observed, and the transition probability between a pair of times (t1, t2) is assumed to depend on the covariate value at t1. 
covinits 
Initial values for loglinear effects of covariates on the transition intensities. This should be a named list with each element corresponding to a covariate. A single element contains the initial values for that covariate on each transition intensity, reading across the rows in order. For a pair of effects constrained to be equal, the initial value for the first of the two effects is used. For example, for a model with the above For factor covariates, name each level by concatenating the name of
the covariate with the level name, quoting if necessary. For example,
for a covariate
If not specified or wrongly specified, initial values are assumed to be zero. 
constraint 
A list of one numeric vector for each named covariate. The
vector indicates which covariate effects on intensities are
constrained to be equal. Take, for example, a model with five
transition intensities and two covariates. Specifying
constrains the effect of age to be equal for the first three intensities, and equal for the fourth and fifth. The effect of treatment is assumed to be different for each intensity. Any vector of increasing numbers can be used as indicators. The intensity parameters are assumed to be ordered by reading across the rows of the transition matrix, starting at the first row, ignoring the diagonals. Negative elements of the vector can be used to indicate that particular covariate effects are constrained to be equal to minus some other effects. For example:
constrains the second and third age effects to be equal, the first effect to be minus the second, and the fifth age effect to be minus the fourth. For example, it may be realisitic that the effect of a covariate on the "reverse" transition rate from state 2 to state 1 is minus the effect on the "forward" transition rate, state 1 to state 2. Note that it is not possible to specify exactly which of the covariate effects are constrained to be positive and which negative. The maximum likelihood estimation chooses the combination of signs which has the higher likelihood. For categorical covariates, defined using
where Make sure the
sets the first (baseline) level of unordered factors to zero, then the baseline level is ignored in this specification. To assume no covariate effect on a certain transition, use the

misccovariates 
A formula representing the covariates on the
misclassification probabilities, analogously to This must be a single formula  lists are not supported, unlike

misccovinits 
Initial values for the covariates on the
misclassification probabilities, defined in the same way as 
miscconstraint 
A list of one vector for each named covariate on
misclassification probabilities. The vector indicates which
covariate effects on misclassification probabilities are
constrained to be equal, analogously to 
hcovariates 
List of formulae the same length as 
hcovinits 
Initial values for the hidden Markov model covariate effects. A list of the
same length as 
hconstraint 
A named list. Each element is a vector of constraints on the named hidden Markov model parameter. The vector has length equal to the number of times that class of parameter appears in the whole model. For example consider the threestate hidden Markov model described
above, with normallydistributed outcomes for states 1 and 2.
To constrain the outcome variance to be equal for states 1 and 2,
and to also constrain the effect of
Note this excludes initial state occupancy probabilities and covariate effects on those probabilities, which cannot be constrained. 
hranges 
Range constraints for hidden Markov model parameters.
Supplied as a named list, with each element corresponding to the
named hidden Markov model parameter. This element is itself a
list with two elements, vectors named "lower" and "upper". These
vectors each have length equal to the number of times that class
of parameter appears in the whole model, and give the
corresponding mininum amd maximum allowable values for that
parameter. Maximum likelihood estimation is performed with these
parameters constrained in these ranges (through a log or
logittype transformation).
Lower bounds of For example, in the threestate model above, to constrain the mean for state 1 to be between 0 and 6, and the mean of state 2 to be between 7 and 12, supply
These default to the natural ranges, e.g. the positive real line
for variance parameters, and [0,1] for probabilities. Therefore
Initial values should be strictly within any ranges, and not on the range boundary, otherwise optimisation will fail with a "nonfinite value" error. 
qconstraint 
A vector of indicators specifying which baseline transition intensities are equal. For example,
constrains the third and fourth intensities to be equal, in a model
with four allowed instantaneous transitions. When there are
covariates on the intensities and 
econstraint 
A similar vector of indicators specifying which
baseline misclassification probabilities are constrained to be
equal. Only used if the model is specified using 
initprobs 
Only used in hidden Markov models. Underlying state occupancy probabilities at each subject's first observation. Can either be a vector of nstates elements with common probabilities to all subjects, or a nsubjects by nstates matrix of subjectspecific probabilities. This refers to observations after missing data and subjects with only one observation have been excluded. If these are estimated (see 
est.initprobs 
Only used in hidden Markov models.
If Note that the free parameters during this estimation exclude the state 1 occupancy probability, which is fixed at one minus the sum of the other probabilities. 
initcovariates 
Formula representing covariates on the initial
state occupancy probabilities, via multinomial logistic regression. The
linear effects of these covariates, observed at the individual's first
observation time, operate on the log ratio of the state r
occupancy probability to the state 1 occupancy probability, for each
r = 2 to the number of states. Thus the state 1 occupancy
probability should be nonzero. If 
initcovinits 
Initial values for the covariate effects

deathexact 
Vector of indices of absorbing states whose time of entry
is known exactly, but the individual is assumed to be in an unknown
transient state ("alive") at the previous instant. This is the
usual situation for times of death in chronic disease monitoring
data. For example, if you specify See the The Note that you do not always supply a 
death 
Old name for the 
censor 
A state, or vector of states, which indicates
censoring. Censoring means that the observed state is known only to be
one of a particular set of states. For example, Note that in contrast to the usual terminology of survival
analysis, here it is the state which is considered to be censored,
rather than the event time. If at the end of a study, an individual
has not died, but their true state is known, then Note in particular that general timeinhomogeneous Markov models with
piecewise constant transition
intensities can be constructed using the 
censor.states 
Specifies the underlying states which censored observations can
represent. If
means that observations coded 99 represent either state 2 or state 3, while observations coded 999 are really either state 3 or state 4. 
pci 
Model for piecewiseconstant intensities. Vector of cut points defining the times, since the start of the process, at which intensities change for all subjects. For example
specifies that the intensity changes at time points 5 and 10.
This will automatically construct a model with a categorical
(factor) covariate called
Thus if Internally, this works by inserting censored observations in the data at times when the intensity changes but the state is not observed. If the supplied times are outside the range of the time variable in
the data, After fitting a timeinhomogeneous model,
This facility does not support interactions between time and
other covariates. Such models need to be specified "by hand", using
a state variable with censored observations inserted.
Note that the Note that you do not need to use 
phase.states 
Indices of states which have a twophase sojourn distribution. This defines a semiMarkov model, in which the hazard of an onward transition depends on the time spent in the state. This uses the technique described by Titman and Sharples (2009). A hidden Markov model is automatically constructed on an expanded state space, where the phases correspond to the hidden states. The "tau" proportionality constraint described in this paper is currently not supported. Covariates, constraints, Hidden Markov models can additionally be given phased states. The
user supplies an outcome distribution for each original state using
Output functions are presented as it were a hidden Markov model on the expanded state space, for example, transition probabilities between states, covariate effects on transition rates, or prevalence counts, are not aggregated over the hidden phases. Numerical estimation will be unstable when there is weak evidence for a twophase sojourn distribution, that is, if the model is close to Markov. See This is an experimental feature, and some functions are not implemented. Please report any experiences of using this feature to the author! 
phase.inits 
Initial values for phasetype models. A list with one component for each "twophased" state. Each component is itself a list of two elements. The first of these elements is a scalar defining the transition intensity from phase 1 to phase 2. The second element is a matrix, with one row for each potential destination state from the twophased state, and two columns. The first column is the transition rate from phase 1 to the destination state, and the second column is the transition rate from phase 2 to the destination state. If there is only one destination state, then this may be supplied as a vector. In phase type models, the initial values for transition rates out of
nonphased states are taken from the 
exacttimes 
By default, the transitions of the Markov process
are assumed to take place at unknown occasions in between the
observation times. If Note that the complete history of the multistate process is known with this type of data. The models which msm fits have the strong assumption of constant (or piecewiseconstant) transition rates. Knowing the exact transition times allows more realistic models to be fitted with other packages. For example parametric models with sojourn distributions more flexible than the exponential can be fitted with the flexsurv package, or semiparametric models can be implemented with survival in conjunction with mstate. 
cl 
Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95. 
fixedpars 
Vector of indices of parameters whose values will be fixed at their initial values during the optimisation. These are given in the order: transition intensities (reading across rows of the transition matrix), covariates on intensities (ordered by intensities within covariates), hidden Markov model parameters, including misclassification probabilities (ordered by parameters within states), hidden Markov model covariate parameters (ordered by covariates within parameters within states), initial state occupancy probabilities (excluding the first probability, which is fixed at one minus the sum of the others). If there are equality constraints on certain parameters, then
To fix all parameters, specify This can be useful for profiling likelihoods, and building complex models stage by stage. 
center 
If 
opt.method 
If "optim", "nlm" or "bobyqa", then the corresponding R
function will be used for maximum likelihood estimation.
If "fisher", then a specialised Fisher scoring method is used
(Kalbfleisch and Lawless, 1985) which can be faster
than the generic methods, though less robust. This is only
available for Markov models with panel data ( 
hessian 
If If 
use.deriv 
If 
use.expm 
If 
analyticp 
By default, the likelihood for certain simpler 3, 4
and 5 state models is calculated using an analytic expression for
the transition probability (P) matrix. For all other models, matrix
exponentiation is used to obtain P. To revert to the original method
of using the matrix exponential for all models, specify 
na.action 
What to do with missing data: either 
... 
Optional arguments to the generalpurpose R
optimisation routine, It is often worthwhile to normalize the optimisation using
If 'false' convergence is reported and the standard
errors cannot be calculated due to a nonpositivedefinite Hessian,
then consider tightening the tolerance criteria for convergence. If
the optimisation takes a long time, intermediate steps can be
printed using the For the Fisher scoring method, a 
Details
For full details about the methodology behind the msm package, refer to the PDF manual ‘msmmanual.pdf’ in the ‘doc’ subdirectory of the package. This includes a tutorial in the typical use of msm. The paper by Jackson (2011) in Journal of Statistical Software presents the material in this manual in a more concise form.
msm was designed for fitting continuoustime Markov
models, processes where transitions can occur at any time.
These models are defined by intensities, which govern both the
time spent in the current state and the probabilities of the next
state. In discretetime models, transitions are known in
advance to only occur at multiples of some time unit, and the model is
purely governed by the probability distributions of the state at the
next time point, conditionally on the state at the current time. These
can also be fitted in msm, assuming that there is a
continuoustime process underlying the data. Then the fitted
transition probability matrix over one time period, as
returned by pmatrix.msm(...,t=1)
is equivalent to the
matrix that governs the discretetime model. However, these can be
fitted more efficiently using multinomial logistic regression, for
example, using multinom
from the R package nnet (Venables
and Ripley, 2002).
For simple continuoustime multistate Markov models, the likelihood is calculated in terms of the transition intensity matrix Q. When the data consist of observations of the Markov process at arbitrary times, the exact transition times are not known. Then the likelihood is calculated using the transition probability matrix P(t) = exp(tQ), where exp is the matrix exponential. If state i is observed at time t and state j is observed at time u, then the contribution to the likelihood from this pair of observations is the i,j element of P(u  t). See, for example, Kalbfleisch and Lawless (1985), Kay (1986), or Gentleman et al. (1994).
For hidden Markov models, the likelihood for an individual with k observations is calculated directly by summing over the unknown state at each time, producing a product of k matrices. The calculation is a generalisation of the method described by Satten and Longini (1996), and also by Jackson and Sharples (2002), and Jackson et al. (2003).
There must be enough information in the data on each state to estimate each transition rate, otherwise the likelihood will be flat and the maximum will not be found. It may be appropriate to reduce the number of states in the model, the number of allowed transitions, or the number of covariate effects, to ensure convergence. Hidden Markov models, and situations where the value of the process is only known at a series of snapshots, are particularly susceptible to nonidentifiability, especially when combined with a complex transition matrix. Choosing an appropriate set of initial values for the optimisation can also be important. For flat likelihoods, 'informative' initial values will often be required. See the PDF manual for other tips.
Value
To obtain summary information from models fitted by the
msm
function, it is recommended to use extractor
functions such as qmatrix.msm
,
pmatrix.msm
, sojourn.msm
, msm.form.qoutput
. These provide
estimates and confidence intervals for quantities such as transition
probabilities for given covariate values.
For advanced use, it may be necessary to directly use information
stored in the object returned by msm
. This is
documented in the help page msm.object
.
Printing a msm
object by typing the object's name at the
command line implicitly invokes print.msm
. This
formats and prints the important information in the model fit,
and also returns that information in an R object.
This includes estimates and confidence intervals for the transition intensities and (log) hazard
ratios for the corresponding covariates. When there is a hidden Markov
model, the chief information in the
hmodel
component is also formatted and printed. This includes
estimates and confidence intervals for each
parameter.
Author(s)
C. H. Jackson chris.jackson@mrcbsu.cam.ac.uk
References
Jackson, C.H. (2011). MultiState Models for Panel Data: The msm Package for R., Journal of Statistical Software, 38(8), 129. URL http://www.jstatsoft.org/v38/i08/.
Kalbfleisch, J., Lawless, J.F., The analysis of panel data under a Markov assumption Journal of the Americal Statistical Association (1985) 80(392): 863–871.
Kay, R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics (1986) 42: 855–865.
Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multistate Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994) 13(3): 805–821.
Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the 'true' course of a marker of the progression of human immunodeficiency virus disease (with discussion) Applied Statistics 45(3): 275309 (1996)
Jackson, C.H. and Sharples, L.D. Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients Statistics in Medicine, 21(1): 113–128 (2002).
Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multistate Markov models for disease progression with classification error. The Statistician, 52(2): 193–209 (2003)
Titman, A.C. and Sharples, L.D. SemiMarkov models with phasetype sojourn distributions. Biometrics 66, 742752 (2009).
Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S, second edition. Springer.
See Also
simmulti.msm
, plot.msm
,
summary.msm
, qmatrix.msm
,
pmatrix.msm
, sojourn.msm
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  ### Heart transplant data
### For further details and background to this example, see
### Jackson (2011) or the PDF manual in the doc directory.
print(cav[1:10,])
twoway4.q < rbind(c(0.5, 0.25, 0, 0.25), c(0.166, 0.498, 0.166, 0.166),
c(0, 0.25, 0.5, 0.25), c(0, 0, 0, 0))
statetable.msm(state, PTNUM, data=cav)
crudeinits.msm(state ~ years, PTNUM, data=cav, qmatrix=twoway4.q)
cav.msm < msm( state ~ years, subject=PTNUM, data = cav,
qmatrix = twoway4.q, deathexact = 4,
control = list ( trace = 2, REPORT = 1 ) )
cav.msm
qmatrix.msm(cav.msm)
pmatrix.msm(cav.msm, t=10)
sojourn.msm(cav.msm)
