semiMarkov: Parametric estimation in multi-state semi-Markov models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

This function computes the parametric maximum likelihood estimation in multi-state semi-Markov models in continuous-time. The effect of time varying or fixed covariates can be studied using a proportional intensities model for the hazard of the sojourn time.

Usage

1
2
3
4
semiMarkov(data, cov = NULL, states, mtrans, cov_tra = NULL, 
          cens = NULL, dist_init=NULL, proba_init = NULL, coef_init = NULL, 
          alpha_ci = 0.95, Wald_par = NULL, eqfun = NULL, 
          ineqLB = NULL,ineqUB = NULL, control = list() )

Arguments

data

data frame of the form data.frame(id,state.h,state.j,time), where

  • id: the individual identification number

  • state.h: state left by the process

  • state.j: state entered by the process

  • time: waiting time in state state.h

This data.frame containts one row per transition (possibly several rows per patient).

cov

Optional data frame containing covariates values.

states

A numeric vector giving names of the states (names are values used in state.h).

mtrans

A quadratic matrix of characters describing the possible transitions and the distributions of waiting time. The rows represent the left states, and the columns represent the entered states. If an instantaneous transition is not allowed from state h to state j, then mtrans should have (h,j) entry FALSE, otherwise it should be "E" (or "Exp" or "Exponential") for Exponential distribution, "W" (or "Weibull") for Weibull distribution or "EW" (or "EWeibull" or "Exponentiated Weibull") for Exponentiated Weibull distribution. If TRUE is used instead of the name of the distribution, then a Weibull distribution is considered. By definition of a semi-Markov model, the transitions into the same state are not possible. The diagonal elements of mtrans must be set to FALSE otherwise the function will stop.

cov_tra

Optional list of vectors: a vector is associated with covariate included in the model. For a given covariate, the vector contains the transitions "hj" for which the covariate have an effect (only the transitions specified in mtrans are allowed). The effect of covariates can then be considered only on specific transitions. By default, the effects of covariates on all the possible transitions are studied.

cens

A character giving the code for censored observations in the column state.j of the data. Default is NULL which means that the censoring is defined as a transition fron state h to state h.

dist_init

Optional numeric vector giving the initial values of the distribution parameters. Default is 1 for each distribution parameter. The length of the vector depend on the chosen distribution, number of transitions and states.

proba_init

Optional numeric vector giving the initial values of the transition probabilities. The sum of the probabilities in the same row must be equal to 1. According to semi-Markov model, the probability to stay in the same state must be equal to 0. The default values for the transition probabilities are estimated from the data. If data = NULL, the argument proba_init is obligatory.

coef_init

Optional numeric vector giving the initial values of the regression coefficients associated with the covariates. Default is 0 for each regression coefficient which means that the covariate has no effect.

alpha_ci

Confidence level to be considered for the confidence intervals. The default value is 0.95.

Wald_par

Optional numeric vector giving the values to be tested (null hypothesis) by the Wald test for each parameter. The Wald statistics are evaluated only for the parameters of distributions and regression coefficients. The length of this vector must then be equal to the number of those parameters. The order of the values must be as in the parameters table given by objects semiMarkov or param.init (excluding the parameters associated to the transition probabilities). The default values for the elements of Wald_par vector are 1 for the distribution parameters and 0 for the regression coefficients.

eqfun

Optional list given equality constraints between parameters. These constraints are passed using the equality constraint function that can be defined in the solnp optimization function. See below for details.

ineqLB

Optional list given values of lower bound for parameters. These values are used in the inequality constraint that can be defined in the solnp optimization function. See below for details.

ineqUB

Optional list given values of upper bound for parameters. These values are used in the inequality constraint that can be defined in the solnp optimization function. See below for details.

control

The control list of optimization parameters for solnp optimization function.

Details

This function fits parametric multi-state semi-Markov model described in Listwon and Saint-Pierre (2013) to longitudinal data. Additional details about the methodology behind the SemiMarkov package can be found in Limnios and Oprisan (2001), Foucher et al. (2006) and Perez-Ocon and Ruiz-Castro (1999).

Consider an homogeneous semi-Markov process with a finite state space. In a parametric framework, distributions of the waiting time belong to parametric families. The distribution of the waiting time can be chosen between the exponential, the Weibull and the exponentiated Weibull distributions. The exponential distribution with scale parameter σ>0 has a density defined as follows

f(x) =(1/σ) exp(-x/σ).

The Weibull distribution with scale parameter σ>0 and shape parameter ν>0 has a density given by (same as one defined in dweibull)

g(x) = (ν/σ)(x/σ)^{ν-1} exp(-(x/σ)^ν).

The exponentiated Weibull distribution (or generalized Weibull) with scale parameter σ>0, shape parameter ν>0 and family parameter equal to θ>0 has a density given by (same as one defined in function dgweibull from the R package rmutil)

h(x) = θ(ν/σ)(x/σ)^{ν-1} exp(-(x/σ)^ν) (1-exp(-(x/σ)^ν))^{θ-1}.

These three distributions are nested. The exponentiated Weibull density with θ=1 gives a Weibull distribution and the Weibull density with ν=1 gives the exponential density.

Note that the effects of both constant and time-varying covariates on the hazards of sojourn time can be studied using a proportional intensities model. The effects of covariates can then be interpreted in terms of relative risk.

The model parameters are the distribution parameters, the transition probabilities of the Markov chain and the regression coefficients associated with covariates. The number of parameters depends on the chosen model: the distributions of the sojourn times, the number of states and the transitions between states specified with the matrix mtrans, the number of covariates (cov) and their effects or not on transitions (cov_tra).

The default initial values for the distribution parameters are fixed to 1. As the three possible distributions are nested for parameters equal to 1 (See details of the semiMarkov function), the initial distribution corresponds to an exponential with parameter equal to 1 (whatever the chosen distribution). The default initial values for the regression coefficients are fixed to 0 meaning that the covariates have no effect on the hazard rates of the sojourn times. These initial values may be changed using the arguments dist_init and coef_init.

By default, the initial probabilities are calculated by simple proportions. The probability associated to the transition from h to j is estimed by the number of observed transitions from state h to state j divided by the total number of transitions from state h observed in the data. The results are displayed in matrix matrix.P. The number of parameters for transition probabilities is smaller than the number of possible transitions as the probabilities in the same row sum up to one. Considering this point and that the probability to stay in the same state is zero, the user can changed the initial values using the argument proba_init.

The Yinyu Ye optimization solver to nonlinear problem is applied to maximize the log-likelihood using the function solnp created by A. Ghalanos and S. Theussl. In order to modify the optimization parameters refer to the package Rsolnp documentation.

Some optimization difficulties can arrise when there is not enough information in the data to estimate each transition rate. The users can change the optimization parameters and the initial values. It may be appropriate to reduce the number of states in the model, the number of allowed transitions, or the number of covariate effects, to ensure convergence.

Some additionals constraints can be introduced using eqfun, ineqLB and ineqUB. These constraints on distribution parameters, transition probabilities and regression coefficients can be defined using lists of vectors. The argument eqfun gives the possibility to add constraints of type par1 = a*par2 (a is a constant). This equality constraint must be expressed with a vector of 3 elements where the first element is the identifier of the parameters type ("dist" for distribution parameters, "proba" for the transition probabilities and "coef" for the regression coefficients), the second and the third elements are the index of par1 and par2, respectively. The index values of distribution parameters, transition probabilities and regression coefficients can be found in the table provided by an object semiMarkov. The last element of the vector corresponds to the constant a. The arguments ineqLB and ineqUB allow to add constraints of type par ≥q a and par ≤q a, respectively. These arguments are lists of vectors of length 3 where the first element is the type of the parameter ("dist", "proba" or "coef"), the second element is the index of parameter par and the last one is the constant a. If a chosen constraint corresponds to a transition probability, it should be considered that the last probabilities in a row of the transition matrix are not estimated but obtained directly since the sum of transition probabilities in the same row is equal to 1. Thus, no additional constraints related to these parameters are permitted. Moreover, note that the argument eqfun does not allowed to define relationships between parameters of different types (for instance, a transition probability can not be equal to a regression coefficient). The optional constraints on parameters should be used prudently since they may induce problems in the convergence of the optimization method. In particular, the Wald statistic and the standard deviation may not be computed for some parameters due to negative values in the hessian matrix. Note that the default constraints induce by the model definition are treated in priority.

Value

call

The original call to semiMarkov.

minus2loglik

Minus twice the maximized log-likelihood.

solution

Etimations of the distribution parameters, probabilities of the embedded Markov chain and regression coefficients (if any considered) for each transition specified in the matrix mtrans. This is a data frame with three columns: the label of the parameter, the transition associated with and the estimated value.

opt.message

The message giving the information on the optimization result returned by the constrOptim.nl function.

opt.iter

Number of outer iterations of the optimization method.

nstates

The length of vector states interpreted as the number of possible states for the process.

table.state

A table, with starting states as rows and arrival states as columns, which provides the number of observed transitions between two states. This argument can be used to quickly summarize multi-state data.

Ncens

Number of individuals subjected to censoring.

Transition_matrix

A matrix containing the informations on the model definition : the possible transitions and the distribution of waiting times for each transition (Exponential, Weibull or Exponentiated Weibull).

param.init

Recall the initial values of the parameters. The third column of this object can be used in hazard function.

table.dist

Statistics for the estimations of distribution parameters of waiting time distributions. For the exponential distribution one data frame for the parameter sigma is returned, for the Weibull distribution two data frames for sigma and nu are returned, and for the Exponentiated Weibull distribution three data frames for sigma, nu and theta are returned. The columns of each data frame are the possible transitions, the estimations, the standard deviations, the lower and upper bounds of confidence intervals, the Wald test null hypothesis, the Wald test statistics and the p-values of the Wald test when testing hypothesis sigma=1, nu=1 or theta=1.

table.proba

A data frame giving the estimations of the transition probabilities of the Markov chain and their standard deviations. By definition, the probability associated to the last possible transition of each row of the matrix mtrans is equal to 1-pr, where pr is the sum of all other probabilities from the row.

table.coef

If some covariates are included in the model it returns a data frame with the statistics for the estimated values of the regression coefficients. The columns of the data frame are the transitions associated with the coefficients, the estimations, the standard deviations, the lower and upper bounds of confidence intervals, the Wald test null hypothesis, the Wald test statistics and the p-values of the Wald test when testing hypothesis coef=0.

table.param

Data frame with the statistics for all the model parameters, that is table.dist, table.proba and table.coef in a single data frame.

Note

Printing a semiMarkov object by typing the object's name at the command line implicitly invokes print.semiMarkov.

Author(s)

Agnieszka Listwon-Krol, Philippe Saint-Pierre

References

Krol, A., Saint-Pierre P. (2015). SemiMarkov : An R Package for Parametric Estimation in Multi-State Semi-Markov Models. 66(6), 1-16.

Ghalanos, A. and Theussl S. (2012). Rsolnp: General Non-linear Optimization Using Augmented Lagrange Multiplier Method. R package version 1.14.

Limnios, N., Oprisan, G. (2001). Semi-Markov processes and reliability. Statistics for Industry and Technology. Birkhauser Boston.

Foucher, Y., Mathieu, E., Saint-Pierre, P., Durand, J.F., Daures, J.P. (2006). A semi-Markov model based on Generalized Weibull distribution with an illustration for HIV disease. Biometrical Journal, 47(6), 825-833.

Perez-Ocon, R., Ruiz-Castro, J. E. (1999). Semi-markov models and applications, chapter 14, pages 229-238. Kluwer Academic Publishers.

See Also

param.init, hazard, summary.semiMarkov, print.semiMarkov

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
## Asthma control data
data(asthma)

## Definition of the model:  states, names, possible transtions and waiting time 
## distributions
states_1 <- c("1","2","3")
mtrans_1 <- matrix(FALSE, nrow = 3, ncol = 3)
mtrans_1[1, 2:3] <- c("E","E")
mtrans_1[2, c(1,3)] <- c("E","E")
mtrans_1[3, c(1,2)] <- c("W","E")

## semi-Markov model without covariates
fit1 <- semiMarkov(data = asthma, states = states_1, mtrans = mtrans_1)

## semi-Markov model with one covariate 
## "BMI" affects all transitions
BMI <- as.data.frame(asthma$BMI)
fit2 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1)
## semi-Markov model with one covariate 
## "BMI" affects the transitions "1->3" and "3->1"
fit3 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1,
                   cov_tra = list(c("13","31")))

## semi-Markov model with two covariates 
## "BMI" affects the transitions "1->3" and "3->1"
## "Sex" affects the transition "3->1"
SEX <- as.data.frame(asthma$Sex)
fit4 <- semiMarkov(data = asthma, cov = as.data.frame(cbind(BMI,SEX)),
                   states = states_1, mtrans = mtrans_1,
                   cov_tra = list(c("13","31"),c("31")))
                   
## semi-Markov model using specific initial values      
## same model as "fit1" but using different initial values
## "fit5" and "fit6" are equivalent

init <- param.init(data = asthma, states = states_1, mtrans = mtrans_1,
        dist_init=c(rep(1.5,6),c(1.8)), proba_init=c(0.2,0.8,0.3,0.7,0.35,0.65))
fit5 <- semiMarkov(data = asthma, states = states_1, mtrans = mtrans_1,
                   dist_init=init$dist.init[,3], proba_init=init$proba.init[,3])
fit6 <- semiMarkov(data = asthma, states = states_1, mtrans = mtrans_1,
                   dist_init=c(rep(1.5,6),c(1.8)), 
                   proba_init=c(0.2,0.8,0.3,0.7,0.35,0.65))                   


## The Wald test null hypothesis is modified
## Wald statistics when testing nullity of distribution parameters
## and regression coefficients equal to -1
fit7 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1,
                   Wald_par = c(rep(0,7),rep(-1,6)))
            
## semi-Markov model with additional constraints 
## distribution parameters sigma for transition "1->3" = sigma for transition "2->1" 
fit8 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1,
                   eqfun = list(c("dist",2,3,1)))

## semi-Markov model with additional constraints 
## regression coefficients beta for transition "1->2" = beta for transition "2->1" 
##                                                    = beta for transition "2->3"  
fit9 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1,
                   eqfun = list(c("coef",1,3,1),c("coef",1,4,1)))

## semi-Markov model with additional constraints 
## regression coeficient beta for transition "1->2" belongs to [-0.2,0.2]
## and regression coeficient beta for transition "2->3" belongs to [-0.05,0.05]
fit10 <- semiMarkov(data = asthma, cov = BMI, states = states_1, mtrans = mtrans_1,
                    ineqLB = list(c("coef",1,-0.2),c("coef",4,-0.05)),
                    ineqUB = list(c("coef",1,0.2),c("coef",4,0.05)))

SemiMarkov documentation built on July 2, 2019, 5:03 p.m.