dist | R Documentation |

An object that specifies the distribution to be fitted by the `MGLMfit`

function, or the regression model to be fitted by the `MGLMreg`

or `MGLMsparsereg`

functions.
Can be chosen from `"MN"`

, `"DM"`

, `"NegMN"`

, or `"GDM"`

.

A multinomial distribution models the counts of *d* possible outcomes.
The counts of categories are negatively correlated.
The density of a *d* category count vector *y* with parameter
*p=(p_1, …, p_d)* is

*
P(y|p) = C_{y_1, …, y_d}^{m} prod_{j=1}^{d} p_j^{y_j},
*

where *m = sum_{j=1}^d y_j*, *0 < p_j < 1*, and *sum_{j=1}^d p_j = 1*.
Here, *C_k^n*, often read as "*n* choose *k*", refers the number of *k* combinations from a set of *n* elements.

The `MGLMreg`

function with `dist="MN"`

calculates the MLE of regression coefficients *β_j* of the multinomial logit model, which has link function *p_j = exp(Xβ_j) / (1 + sum_{j=1}^{d-1} exp(Xβ_j))*, *j=1,…,d-1*. The `MGLMsparsereg`

function with `dist="MN"`

fits regularized multinomial logit model.

When the multivariate count data exhibits over-dispersion, the traditional
multinomial model is insufficient. Dirichlet multinomial distribution models the
probabilities of the categories by a Dirichlet distribution.
The density of a *d* category count vector *y*, with
parameter *α = (α_1, …, α_d)*,
*α_j > 0*, is

*
P(y|α) =
C_{y_1, …, y_d}^{m} prod_{j=1}^d
{Gamma(α_j+y_j)Gamma(sum_{j'=1}^d α_j')} / {Gamma(α_j)Gamma(sum_{j'=1}^d α_j' + sum_{j'=1}^d y_j')},
*

where *m = sum_{j=1}^d y_j*. Here, *C_k^n*, often read as "*n* choose *k*",
refers the number of *k* combinations from a set of *n* elements.

The `MGLMfit`

function with `dist="DM"`

calculates the maximum likelihood estimate (MLE) of *(α_1, …, α_d)*. The `MGLMreg`

function with `dist="DM"`

calculates the MLE of regression coefficients *β_j* of the Dirichlet multinomial regression model, which has link function *α_j = exp(Xβ_j)*, *j=1,…,d*. The `MGLMsparsereg`

function with `dist="DM"`

fits regularized Dirichlet multinomial regression model.

The more flexible Generalized Dirichlet multinomial model can be used when the counts of categories have both positive and negative correlations.
The probability mass of a count vector *y* over *m* trials with parameter
*(α, β)=(α_1, …, α_{d-1}, β_1, …, β_{d-1})*,
*α_j, β_j > 0*, is

*
P(y|α,β)
=C_{y_1, …, y_d}^{m} prod_{j=1}^{d-1} {Gamma(α_j+y_j)Gamma(β_j+z_{j+1})Gamma(α_j+β_j)} / {Gamma(α_j)Gamma(β_j)Gamma(α_j+β_j+z_j)},
*

where *z_j = sum_{k=j}^d y_k* and *m = sum_{j=1}^d y_j*. Here, *C_k^n*, often read as "*n* choose *k*",
#' refers the number of *k* combinations from a set of *n* elements.

The `MGLMfit`

with `dist="GDM"`

calculates the MLE of *(α, β)=(α_1, …, α_{d-1}, β_1, …, β_{d-1})*. The `MGLMreg`

function with `dist="GDM"`

calculates the MLE of regression coefficients *α_j, β_j* of the generalized Dirichlet multinomial regression model, which has link functions *α_j=exp(Xα_j)* and *β_j=exp(Xβ_j)*, *j=1, …, d-1*. The `MGLMsparsereg`

function with `dist="GDM"`

fits regularized generalized Dirichlet multinomial regression model.

Both the multinomial distribution and Dirichlet multinomial distribution are good for
negatively correlated counts. When the counts of categories are positively
correlated, the negative multinomial distribution is preferred.
The probability mass function of a *d* category count vector *y* with parameter
*(p_1, …, p_{d+1}, β)*, *sum_{j=1}^{d+1} p_j = 1*, *p_j > 0*, *β > 0*, is

*
P(y|p,β) = C_{m}^{β+m-1} C_{y_1, …, y_d}^{m}
prod_{j=1}^d p_j^{y_j} p_{d+1}^β = (β_m)/(m!) C_{y_1, …, y_d}^{m} prod_{j=1}^d p_j^{y_j} p_{d+1}^β,
*

where *m = sum_{j=1}^d y_j*. Here, *C_k^n*, often read as "*n* choose *k*", refers the number of *k* combinations from a set of *n* elements.

The `MGLMfit`

function with `dist="NegMN"`

calculates the MLE of *(p_1, …, p_{d+1}, β)*. The `MGLMreg`

function with `dist="NegMN"`

and `regBeta=FALSE`

calculates the MLE of regression coefficients *(α_1,…,α_d, β)* of the negative multinomial regression model, which has link function *p_{d+1} = 1/(1 + sum_{j=1}^d exp(Xα_j))*, *p_j = exp(Xα_j) p_{d+1}*, *j=1, …, d*. When `dist="NegMN"`

and `regBeta=TRUE`

, the overdispersion parameter is linked to covariates via *β=exp(Xα_{d+1})*, and the
function `MGLMreg`

outputs an estimated matrix of
*(α_1, …, α_{d+1})*. The `MGLMsparsereg`

function with `dist="NegMN"`

fits regularized negative multinomial regression model.

Yiwen Zhang and Hua Zhou

`MGLMfit`

, `MGLMreg`

, `MGLMsparsereg`

,
`dmn`

, `ddirmn`

, `dgdirmn`

, `dnegmn`

MGLM documentation built on April 14, 2022, 1:07 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.