fitsmm | R Documentation |
Maximum Likelihood Estimation of a semi-Markov chain starting from one or several sequences. This estimation can be parametric or non-parametric, non-censored, censored at the beginning and/or at the end of the sequence, with one or several trajectories. Several parametric distributions are considered (Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial).
fitsmm(
sequences,
states,
type.sojourn = c("fij", "fi", "fj", "f"),
distr = "nonparametric",
init.estim = "mle",
cens.beg = FALSE,
cens.end = FALSE
)
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
type.sojourn |
Type of sojourn time (for more explanations, see Details). |
distr |
By default If the parametric estimation case is desired,
The distributions to be used in |
init.estim |
Optional.
|
cens.beg |
Optional. A logical value indicating whether or not sequences are censored at the beginning. |
cens.end |
Optional. A logical value indicating whether or not sequences are censored at the end. |
This function estimates a semi-Markov model in parametric or non-parametric case, taking into account the type of sojourn time and the censoring described in references. The non-parametric estimation concerns sojourn time distributions defined by the user. For the parametric estimation, several discrete distributions are considered (see below).
The difference between the Markov model and the semi-Markov model concerns the modeling of the sojourn time. With a Markov chain, the sojourn time distribution is modeled by a Geometric distribution (in discrete time). With a semi-Markov chain, the sojourn time can be any arbitrary distribution. In this package, the available distribution for a semi-Markov model are :
Uniform: f(x) = \frac{1}{n}
for 1 \le x \le n
. n
is the parameter;
Geometric: f(x) = \theta (1-\theta)^{x - 1}
for x = 1, 2,\ldots,n
, 0 < \theta < 1
, \theta
is the probability of success.
\theta
is the parameter;
Poisson: f(x) = \frac{\lambda^x exp(-\lambda)}{x!}
for x = 0, 1, 2,\ldots,n
, with \lambda > 0
.
\lambda
is the parameter;
Discrete Weibull of type 1: f(x)=q^{(x-1)^{\beta}}-q^{x^{\beta}}
, x = 1, 2,\ldots,n
,
with 0 < q < 1
, the first parameter and \beta > 0
the second parameter.
(q, \beta)
are the parameters;
Negative binomial: f(x)=\frac{\Gamma(x+\alpha)}{\Gamma(\alpha) x!} p^{\alpha} (1 - p)^x
,
for x = 0, 1, 2,\ldots,n
, \Gamma
is the Gamma function,
\alpha
is the parameter of overdispersion and p
is the
probability of success, 0 < p < 1
. (\alpha, p)
are the parameters;
Non-parametric.
We define :
the semi-Markov kernel q_{ij}(k) = P( J_{m+1} = j, T_{m+1} - T_{m} = k | J_{m} = i )
;
the transition matrix (p_{trans}(i,j))_{i,j} \in states
of the
embedded Markov chain J = (J_m)_m
, p_{trans}(i,j) = P( J_{m+1} = j | J_m = i )
;
the initial distribution \mu_i = P(J_1 = i) = P(Z_1 = i)
, i \in 1, 2, \dots, s
;
the conditional sojourn time distributions (f_{ij}(k))_{i,j} \in states,\ k \in N ,\ f_{ij}(k) = P(T_{m+1} - T_m = k | J_m = i, J_{m+1} = j )
,
f
is specified by the argument param
in the parametric case
and by distr
in the non-parametric case.
The maximum likelihood estimation of the transition matrix of the embedded
Markov chain is \widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}
.
Five methods are proposed for the estimation of the initial distribution :
The Maximum Likelihood Estimator
for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}
, where Nstart_i
is
the number of occurences of the word i
(of length k
) at
the beginning of each sequence and L
is the number of sequences.
This estimator is reliable when the number of sequences L
is high.
The limit (stationary) distribution of the semi-Markov chain is used as a surrogate of the initial distribution.
The initial distribution is replaced by taking the frequencies of each state in the sequences.
The initial probability of each state is
equal to 1 / s
, with s
, the number of states.
Note that q_{ij}(k) = p_{trans}(i,j) \ f_{ij}(k)
in the general case
(depending on the present state and on the next state). For particular cases,
we replace f_{ij}(k)
by f_{i.}(k)
(depending on the present
state i
), f_{.j}(k)
(depending on the next state j
) and
f_{..}(k)
(depending neither on the present state nor on the next
state).
In this package we can choose different types of sojourn time. Four options are available for the sojourn times:
depending on the present state and on the next state (fij
);
depending only on the present state (fi
);
depending only on the next state (fj
);
depending neither on the current, nor on the next state (f
).
If type.sojourn = "fij"
, distr
is a matrix of dimension (s, s)
(e.g., if the 1st element of the 2nd column is "pois"
, that is to say we
go from the first state to the second state following a Poisson distribution).
If type.sojourn = "fi"
or "fj"
, distr
must be a vector (e.g., if the
first element of vector is "geom"
, that is to say we go from (or to) the
first state to (or from) any state according to a Geometric distribution).
If type.sojourn = "f"
, distr
must be one of "unif"
, "geom"
, "pois"
,
"dweibull"
, "nbinom"
(e.g., if distr
is equal to "nbinom"
, that is
to say that the sojourn time when going from one state to another state
follows a Negative Binomial distribution).
For the non-parametric case, distr
is equal to "nonparametric"
whatever
type of sojourn time given.
If the sequence is censored at the beginning and/or at the end, cens.beg
must be equal to TRUE
and/or cens.end
must be equal to TRUE
.
All the sequences must be censored in the same way.
Returns an object of S3 class smmfit
(inheriting from the S3
class smm
and smmnonparametric class if distr = "nonparametric"
or smmparametric otherwise). The S3 class smmfit
contains:
All the attributes of the S3 class smmnonparametric or smmparametric depending on the type of estimation;
An attribute M
which is an integer giving the total length of
the set of sequences sequences
(sum of all the lengths of the list
sequences
);
An attribute logLik
which is a numeric value giving the value
of the log-likelihood of the specified semi-Markov model based on the
sequences
;
An attribute sequences
which is equal to the parameter
sequences
of the function fitsmm
(i.e. a list of sequences used to
estimate the Markov model).
V. S. Barbu, N. Limnios. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
smmnonparametric, smmparametric, simulate.smm, simulate.smmfit, plot.smm, plot.smmfit
states <- c("a", "c", "g", "t")
s <- length(states)
# Creation of the initial distribution
vect.init <- c(1 / 4, 1 / 4, 1 / 4, 1 / 4)
# Creation of the transition matrix
pij <- matrix(c(0, 0.2, 0.5, 0.3,
0.2, 0, 0.3, 0.5,
0.3, 0.5, 0, 0.2,
0.4, 0.2, 0.4, 0),
ncol = s, byrow = TRUE)
# Creation of the distribution matrix
distr.matrix <- matrix(c(NA, "pois", "geom", "nbinom",
"geom", NA, "pois", "dweibull",
"pois", "pois", NA, "geom",
"pois", "geom", "geom", NA),
nrow = s, ncol = s, byrow = TRUE)
# Creation of an array containing the parameters
param1.matrix <- matrix(c(NA, 2, 0.4, 4,
0.7, NA, 5, 0.6,
2, 3, NA, 0.6,
4, 0.3, 0.4, NA),
nrow = s, ncol = s, byrow = TRUE)
param2.matrix <- matrix(c(NA, NA, NA, 0.6,
NA, NA, NA, 0.8,
NA, NA, NA, NA,
NA, NA, NA, NA),
nrow = s, ncol = s, byrow = TRUE)
param.array <- array(c(param1.matrix, param2.matrix), c(s, s, 2))
# Specify the semi-Markov model
semimarkov <- smmparametric(states = states, init = vect.init, ptrans = pij,
type.sojourn = "fij", distr = distr.matrix,
param = param.array)
seqs <- simulate(object = semimarkov, nsim = c(1000, 10000, 2000), seed = 100)
# Estimation of simulated sequences
est <- fitsmm(sequences = seqs, states = states, type.sojourn = "fij",
distr = distr.matrix)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.