dirmult | R Documentation |
Consider allele frequencies from different subpopulations. The allele counts, X, (or equivalently allele frequencies) are expected to vary between subpopulation. This variability are sometimes refered to as identity-by-decent, but may be modelled as overdispersion due to intra-class correlation theta. The allele counts within each subpopulation is assumed to follow a multinomial distribution conditioned on the allele probabilities, π_1,...,π_{k-1}. When π follows a Dirichlet distribution the marginal distribution of X is Dirichlet-multinomial with parameters π and theta with density:
P(X=x) = (prod_{j=1}^k (1/x_j!) prod_{r=1}^{x_j}(π_j(1-theta) + (r-1)theta))/ ((1/n!) prod_{r=1}^{n}(1-θ + (r-1)theta)).
Using an alternative parameterization the density may be written as:
P(X=x) = (n!*Γ(gamma_+))/Γ(n+gamma_+) prod_{j=1}^k Γ(x_j + gamma_j)/(Γ(gamma_j)*x_j!),
where gamma_+=(1-theta)/theta and gamma_j=π_j*theta.
This formulation second parameterization is used in the iterations
since it converges much faster than the original parameterization.
The function dirmult
estimates the parameters
gamma in the Dirichlet-multinomial distribution and
transform these into
π_1,…,π_{k-1} and
theta.
dirmult(data, init, initscalar, epsilon=10^(-4), trace=TRUE, mode)
data |
A matrix or table with counts. Rows represent subpopulations and columns the different categories of the data. Zero rows or columns are automaticly removed. |
init |
Initial values for the gamma-vector. Default is empty implying the column-proportions are used as initial values. |
initscalar |
Initial value for (1-theta)/theta. Default value is (1-MoM)/MoM where MoM a the method of moment estimate. |
epsilon |
Convergence tolerance. On termination the difference between to succeeding log-likelihoods must be smaller than epsilon. |
trace |
Logical. If TRUE the parameter estimates and log-likelihood value is printed to the screen after each iteration, otherwise no out-put is produces while iterating. |
mode |
Takes values "obs" (default) or "exp" determining whether the observed or expected FIM should be used in the Fisher Scoring. All other arguments produces an error message, but the observed FIM is used in the iterations. |
Returns a list containing:
loglik |
The final log-likelihood value. |
ite |
Number of iterations used. |
gamma |
A vector of gamma estimates. |
pi |
A vector of π estimates. |
theta |
Estimated theta-value. |
dirmult.summary
data(us) fit <- dirmult(us[[1]],epsilon=10^(-4),trace=FALSE) dirmult.summary(us[[1]],fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.