Description Usage Arguments Details Value References See Also Examples
Maximum Likelihood Estimation of a semi-Markov chain starting from one or several sequences. This estimation can be parametric or non-parametric, non-censored, censored at the beginning and/or at the end of the sequence, with one or several trajectories. Several parametric distributions are considered (Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial).
1 2 3 4 5 6 7 8 9 |
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
type.sojourn |
Type of sojourn time (for more explanations, see Details). |
distr |
By default If the parametric estimation case is desired,
The distributions to be used in |
init.estim |
Optional.
|
cens.beg |
Optional. A logical value indicating whether or not sequences are censored at the beginning. |
cens.end |
Optional. A logical value indicating whether or not sequences are censored at the end. |
This function estimates a semi-Markov model in parametric or non-parametric case, taking into account the type of sojourn time and the censoring described in references. The non-parametric estimation concerns sojourn time distributions defined by the user. For the parametric estimation, several discrete distributions are considered (see below).
The difference between the Markov model and the semi-Markov model concerns the modeling of the sojourn time. With a Markov chain, the sojourn time distribution is modeled by a Geometric distribution (in discrete time). With a semi-Markov chain, the sojourn time can be any arbitrary distribution. In this package, the available distribution for a semi-Markov model are :
Uniform: f(x) = \frac{1}{n} for 1 ≤ x ≤ n. n is the parameter;
Geometric: f(x) = θ (1-θ)^{x - 1} for x = 1, 2,…,n, 0 < θ < 1, θ is the probability of success. θ is the parameter;
Poisson: f(x) = \frac{λ^x exp(-λ)}{x!} for x = 0, 1, 2,…,n, with λ > 0. λ is the parameter;
Discrete Weibull of type 1: f(x)=q^{(x-1)^{β}}-q^{x^{β}}, x = 1, 2,…,n, with 0 < q < 1, the first parameter and β > 0 the second parameter. (q, β) are the parameters;
Negative binomial: f(x)=\frac{Γ(x+α)}{Γ(α) x!} p^{α} (1 - p)^x, for x = 0, 1, 2,…,n, Γ is the Gamma function, α is the parameter of overdispersion and p is the probability of success, 0 < p < 1. (α, p) are the parameters;
Non-parametric.
We define :
the semi-Markov kernel q_{ij}(k) = P( J_{m+1} = j, T_{m+1} - T_{m} = k | J_{m} = i );
the transition matrix (p_{trans}(i,j))_{i,j} \in states of the embedded Markov chain J = (J_m)_m, p_{trans}(i,j) = P( J_{m+1} = j | J_m = i );
the initial distribution μ_i = P(J_1 = i) = P(Z_1 = i), i \in 1, 2, …, s;
the conditional sojourn time distributions (f_{ij}(k))_{i,j} \in states,\ k \in N ,\ f_{ij}(k) = P(T_{m+1} - T_m = k | J_m = i, J_{m+1} = j ),
f is specified by the argument param
in the parametric case
and by distr
in the non-parametric case.
The maximum likelihood estimation of the transition matrix of the embedded Markov chain is \widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}.
Five methods are proposed for the estimation of the initial distribution :
The Maximum Likelihood Estimator for the initial distribution. The formula is: \widehat{μ_i} = \frac{Nstart_i}{L}, where Nstart_i is the number of occurences of the word i (of length k) at the beginning of each sequence and L is the number of sequences. This estimator is reliable when the number of sequences L is high.
The limit (stationary) distribution of the semi-Markov chain is used as a surrogate of the initial distribution.
The initial distribution is replaced by taking the frequencies of each state in the sequences.
The initial probability of each state is equal to 1 / s, with s, the number of states.
Note that q_{ij}(k) = p_{trans}(i,j) \ f_{ij}(k) in the general case (depending on the present state and on the next state). For particular cases, we replace f_{ij}(k) by f_{i.}(k) (depending on the present state i), f_{.j}(k) (depending on the next state j) and f_{..}(k) (depending neither on the present state nor on the next state).
In this package we can choose different types of sojourn time. Four options are available for the sojourn times:
depending on the present state and on the next state (fij
);
depending only on the present state (fi
);
depending only on the next state (fj
);
depending neither on the current, nor on the next state (f
).
If type.sojourn = "fij"
, distr
is a matrix of dimension (s, s)
(e.g., if the 1st element of the 2nd column is "pois"
, that is to say we
go from the first state to the second state following a Poisson distribution).
If type.sojourn = "fi"
or "fj"
, distr
must be a vector (e.g., if the
first element of vector is "geom"
, that is to say we go from (or to) the
first state to (or from) any state according to a Geometric distribution).
If type.sojourn = "f"
, distr
must be one of "unif"
, "geom"
, "pois"
,
"dweibull"
, "nbinom"
(e.g., if distr
is equal to "nbinom"
, that is
to say that the sojourn time when going from one state to another state
follows a Negative Binomial distribution).
For the non-parametric case, distr
is equal to "nonparametric"
whatever
type of sojourn time given.
If the sequence is censored at the beginning and/or at the end, cens.beg
must be equal to TRUE
and/or cens.end
must be equal to TRUE
.
All the sequences must be censored in the same way.
Returns an object of S3 class smmfit
(inheriting from the S3
class smm
and smmnonparametric class if distr = "nonparametric"
or smmparametric otherwise). The S3 class smmfit
contains:
All the attributes of the S3 class smmnonparametric or smmparametric depending on the type of estimation;
An attribute M
which is an integer giving the total length of
the set of sequences sequences
(sum of all the lengths of the list
sequences
);
An attribute loglik
which is a numeric value giving the value
of the log-likelihood of the specified semi-Markov model based on the
sequences
;
An attribute sequences
which is equal to the parameter
sequences
of the function fitsmm
(i.e. a list of sequences used to
estimate the Markov model).
V. S. Barbu, N. Limnios. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
smmnonparametric, smmparametric, simulate.smm, simulate.smmfit, plot.smm, plot.smmfit
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | states <- c("a", "c", "g", "t")
s <- length(states)
# Creation of the initial distribution
vect.init <- c(1 / 4, 1 / 4, 1 / 4, 1 / 4)
# Creation of the transition matrix
pij <- matrix(c(0, 0.2, 0.5, 0.3,
0.2, 0, 0.3, 0.5,
0.3, 0.5, 0, 0.2,
0.4, 0.2, 0.4, 0),
ncol = s, byrow = TRUE)
# Creation of the distribution matrix
distr.matrix <- matrix(c(NA, "pois", "geom", "nbinom",
"geom", NA, "pois", "dweibull",
"pois", "pois", NA, "geom",
"pois", "geom", "geom", NA),
nrow = s, ncol = s, byrow = TRUE)
# Creation of an array containing the parameters
param1.matrix <- matrix(c(NA, 2, 0.4, 4,
0.7, NA, 5, 0.6,
2, 3, NA, 0.6,
4, 0.3, 0.4, NA),
nrow = s, ncol = s, byrow = TRUE)
param2.matrix <- matrix(c(NA, NA, NA, 0.6,
NA, NA, NA, 0.8,
NA, NA, NA, NA,
NA, NA, NA, NA),
nrow = s, ncol = s, byrow = TRUE)
param.array <- array(c(param1.matrix, param2.matrix), c(s, s, 2))
# Specify the semi-Markov model
semimarkov <- smmparametric(states = states, init = vect.init, ptrans = pij,
type.sojourn = "fij", distr = distr.matrix,
param = param.array)
seqs <- simulate(object = semimarkov, nsim = c(1000, 10000, 2000), seed = 100)
# Estimation of simulated sequences
est <- fitsmm(sequences = seqs, states = states, type.sojourn = "fij",
distr = distr.matrix)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.