Pearsontype goodnessoffit test for multistate models fitted to panelobserved data.
1 2 3 4 
x 
A fitted multistate model, as returned by 
transitions 
This should be an integer vector indicating which interval transitions should be grouped together in the contingency table. Its length should be the number of allowed interval transitions, excluding transitions from absorbing states to absorbing states. The allowed interval transitions are the set of pairs of states (a,b) for which it is possible to observe a at one time and b at any later time. For example, in a "welldiseasedeath" model with allowed instantaneous 12, 23 transitions, there are 5 allowed interval transitions. In numerical order, these are 11, 12, 13, 22 and 23, excluding absorbingabsorbing transitions. Then, to group transitions 11,12 together, and transitions 22,23 together, specify
Only transitions from the same state may be grouped. By default, each interval transition forms a separate group. 
timegroups 
Number of groups based on quantiles of the time since the start of the process. 
intervalgroups 
Number of groups based on quantiles of the time interval between observations, within time groups 
covgroups 
Number of groups based on quantiles of sum_r q_{irr}, where q_{irr} are the diagonal entries of the transition intensity matrix for the ith transition. These are a function of the covariate effects and the covariate values at the ith transition: q_{irr} is minus the sum of the offdiagonal entries q_{rs}^{(0)} exp (β_{rs}^T z_i) on the rth row. Thus For timeinhomogeneous models specified using the 
groups 
A vector of arbitrary groups in which to categorise each transition. This can be an integer vector or a factor. This can be used to diagnose specific areas of poor fit. For example, the contingency table might be grouped by arbitrary combinations of covariates to detect types of individual for whom the model fits poorly. The length of 
boot 
Estimate an "exact" pvalue using a parametric bootstrap. All objects used in the original call to Note that 
B 
Number of bootstrap replicates. 
next.obstime 
This is a vector of length For individuals who died (entered an absorbing state) before the
next scheduled observation, and the time of death is known exactly,
If the individual did not die, and a scheduled observation did
follow that time point,
If 
N 
Number of imputations for the estimation of the distribution of the next scheduled observation time, when there are exact death times. 
indep.cens 
If 
maxtimes 
A vector of length 
pval 
Calculate a pvalue using the improved approximation of
Titman (2009). This is
optional since it is not needed during bootstrapping, and it is
computationally nontrivial. Only available currently for nonhidden Markov models for
panel data without exact death times. Also not available for
models with censoring, including timehomogeneous models fitted with
the 
This method (AguirreHernandez and Farewell, 2002) is intended for
data which represent observations of the process at arbitrary times
("snapshots", or "panelobserved" data). For data which represent the
exact transition times of the process, prevalence.msm
can be used to assess fit, though without a formal test.
When times of death are known exactly, states are misclassified, or an individual's final observation is a censored state, the modification by Titman and Sharples (2008) is used. The only form of censoring supported is a state at the end of an individual's series which represents an unknown transient state (i.e. the individual is only known to be alive at this time). Other types of censoring are omitted from the data before performing the test.
See the references for further details of the methods. The method used for censored states is a modification of the method in the appendix to Titman and Sharples (2008), described at http://www.mrcbsu.cam.ac.uk/wpcontent/uploads/robustcensoring.pdf (Titman, 2007).
Groupings of the time since initiation, the time interval and the impact of covariates are based on equallyspaced quantiles. The number of groups should be chosen that there are not many cells with small expected numbers of transitions, since the deviance statistic will be unstable for sparse contingency tables. Ideally, the expected numbers of transitions in each cell of the table should be no less than about 5. Conversely, the power of the test is reduced if there are too few groups. Therefore, some sensitivity analysis of the test results to the grouping is advisable.
Saved model objects fitted with previous versions of R (versions less
than 1.2) will need to be refitted under the current R for use with
pearson.msm
.
A list whose first two elements are contingency tables of observed
transitions O and expected transitions E, respectively,
for each combination of groups. The third element is a table of the
deviances (O  E)^2 / E multiplied by the sign of O  E.
If the expected number of transitions is zero then the deviance is zero.
Entries in the third matrix will be bigger in magnitude for groups for
which the model fits poorly.

the fourth element of the list, is a data frame with one row containing
the Pearsontype goodnessoffit test statistic For these models, for comparison with older versions of the package,


(not printed by default) contains the definition of the grouping of the intervals between observations. These groups are defined by quantiles within the groups corresponding to the time since the start of the process. 

If there are exact death times, this contains simulations of the contingency tables and test statistics for each imputation of the next scheduled sampling time. These are averaged over to produce the presented tables and test statistic. This element is not printed by default. With exact death times, the null variance of the test statistic (formed
by taking mean of simulated test statistics) is less than twice the
mean (Titman, 2008), and the null distribution is not chisquared.
In this case, 

If the bootstrap has been used, the element will contain the bootstrap replicates of the test statistics (not printed by default). 

If the Titman (2009) pvalue has been calculated, this contains the weights defining the null distribution of the test statistic as a weighted sum of chisquared(1) random variables (not printed by default). 
Andrew Titman a.titman@lancaster.ac.uk, Chris Jackson chris.jackson@mrcbsu.cam.ac.uk
AguirreHernandez, R. and Farewell, V. (2002) A Pearsontype goodnessoffit test for stationary and timecontinuous Markov regression models. Statistics in Medicine 21:18991911.
Titman, A. and Sharples, L. (2008) A general goodnessoffit test for Markov and hidden Markov models. Statistics in Medicine 27(12):21772195
Titman, A. (2009) Computation of the asymptotic null distribution of goodnessoffit tests for multistate models. Lifetime Data Analysis 15(4):519533.
Titman, A. (2008) Model diagnostics in multistate models of biological systems. PhD thesis, University of Cambridge.
msm
, prevalence.msm
, scoreresid.msm
,
1 2 3 4 5 6 7 8  psor.q < rbind(c(0,0.1,0,0),c(0,0,0.1,0),c(0,0,0,0.1),c(0,0,0,0))
psor.msm < msm(state ~ months, subject=ptnum, data=psor,
qmatrix = psor.q, covariates = ~ollwsdrt+hieffusn,
constraint = list(hieffusn=c(1,1,1),ollwsdrt=c(1,1,2)))
pearson.msm(psor.msm, timegroups=2, intervalgroups=2, covgroups=2)
# More 12, 13 and 14 observations than expected in shorter time
# intervals  the model fits poorly.
# A random effects model might accommodate such fast progressors.

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.