fitmm | R Documentation |
Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.
fitmm(sequences, states, k = 1, init.estim = "mle")
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
k |
Order of the Markov chain. |
init.estim |
Optional.
|
Let X_1, X_2, ..., X_n
be a trajectory of length n
of
the Markov chain X = (X_m)_{m \in N}
of order k = 1
with
transition matrix p_{trans}(i,j) = P(X_{m+1} = j | X_m = i)
. The
maximum likelihood estimation of the transition matrix is
\widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}
, where N_{ij}
is the number of transitions from state i
to state j
and
N_{i.}
is the number of transition from state i
to any state.
For k > 1
we have similar expressions.
The initial distribution of a k-th order Markov chain is defined as
\mu_i = P(X_1 = i)
. Five methods are proposed for the estimation
of the latter :
The Maximum Likelihood Estimator
for the initial distribution. The formula is:
\widehat{\mu_i} = \frac{Nstart_i}{L}
, where Nstart_i
is
the number of occurences of the word i
(of length k
) at
the beginning of each sequence and L
is the number of sequences.
This estimator is reliable when the number of sequences L
is high.
The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).
The initial distribution is estimated
by taking the frequencies of the words of length k
for all sequences.
The formula is \widehat{\mu_i} = \frac{N_i}{N}
, where N_i
is the number of occurences of the word i
(of length k
) in
the sequences and N
is the sum of the lengths of the sequences.
The initial distribution
is estimated by using the product of the frequencies of each state
(for all the sequences) in the word of length k
.
The initial probability of each state is
equal to 1 / s
, with s
, the number of states.
An object of class S3 mmfit
(inheriting from the S3 class mm).
The S3 class mmfit
contains:
All the attributes of the S3 class mm;
An attribute M
which is an integer giving the total length of
the set of sequences sequences
(sum of all the lengths of the list
sequences
);
An attribute logLik
which is a numeric value giving the value
of the log-likelihood of the specified Markov model based on the
sequences
;
An attribute sequences
which is equal to the parameter
sequences
of the function fitmm
(i.e. a list of sequences used to
estimate the Markov model).
mm, simulate.mm
states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)
# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)
seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)
est <- fitmm(sequences = seqs, states = states, k = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.