Description Usage Arguments Details Value See Also Examples
Maximum Likelihood Estimation of the transition matrix and initial distribution of a k-th order Markov chain starting from one or several sequences.
1 | fitmm(sequences, states, k = 1, init.estim = "mle")
|
sequences |
A list of vectors representing the sequences. |
states |
Vector of state space (of length s). |
k |
Order of the Markov chain. |
init.estim |
Optional.
|
Let X_1, X_2, ..., X_n be a trajectory of length n of the Markov chain X = (X_m)_{m \in N} of order k = 1 with transition matrix p_{trans}(i,j) = P(X_{m+1} = j | X_m = i). The maximum likelihood estimation of the transition matrix is \widehat{p_{trans}}(i,j) = \frac{N_{ij}}{N_{i.}}, where N_{ij} is the number of transitions from state i to state j and N_{i.} is the number of transition from state i to any state. For k > 1 we have similar expressions.
The initial distribution of a k-th order Markov chain is defined as μ_i = P(X_1 = i). Five methods are proposed for the estimation of the latter :
The Maximum Likelihood Estimator for the initial distribution. The formula is: \widehat{μ_i} = \frac{Nstart_i}{L}, where Nstart_i is the number of occurences of the word i (of length k) at the beginning of each sequence and L is the number of sequences. This estimator is reliable when the number of sequences L is high.
The stationary distribution is used as a surrogate of the initial distribution. If the order of the Markov chain is more than one, the transition matrix is converted into a square block matrix in order to estimate the stationary distribution. This method may take time if the order of the Markov chain is high (more than three (3)).
The initial distribution is estimated
by taking the frequencies of the words of length k
for all sequences.
The formula is \widehat{μ_i} = \frac{N_i}{N}, where N_i
is the number of occurences of the word i (of length k) in
the sequences and N is the sum of the lengths of the sequences.
The initial distribution is estimated by using the product of the frequencies of each state (for all the sequences) in the word of length k.
The initial probability of each state is equal to 1 / s, with s, the number of states.
An object of class S3 mmfit
(inheriting from the S3 class mm).
The S3 class mmfit
contains:
All the attributes of the S3 class mm;
An attribute M
which is an integer giving the total length of
the set of sequences sequences
(sum of all the lengths of the list
sequences
);
An attribute loglik
which is a numeric value giving the value
of the log-likelihood of the specified Markov model based on the
sequences
;
An attribute sequences
which is equal to the parameter
sequences
of the function fitmm
(i.e. a list of sequences used to
estimate the Markov model).
mm, simulate.mm
1 2 3 4 5 6 7 8 9 10 11 12 | states <- c("a", "c", "g", "t")
s <- length(states)
k <- 2
init <- rep.int(1 / s ^ k, s ^ k)
p <- matrix(0.25, nrow = s ^ k, ncol = s)
# Specify a Markov model of order 2
markov <- mm(states = states, init = init, ptrans = p, k = k)
seqs <- simulate(object = markov, nsim = c(1000, 10000, 2000), seed = 150)
est <- fitmm(sequences = seqs, states = states, k = 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.