dirichlet: Fitting a Dirichlet Distribution

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/family.univariate.R

Description

Fits a Dirichlet distribution to a matrix of compositions.

Usage

1
dirichlet(link = "loglink", parallel = FALSE, zero = NULL, imethod = 1)

Arguments

link

Link function applied to each of the M (positive) shape parameters alpha_j. See Links for more choices. The default gives eta_j=log(alpha_j).

parallel, zero, imethod

See CommonVGAMffArguments for more information.

Details

In this help file the response is assumed to be a M-column matrix with positive values and whose rows each sum to unity. Such data can be thought of as compositional data. There are M linear/additive predictors eta_j.

The Dirichlet distribution is commonly used to model compositional data, including applications in genetics. Suppose (Y_1,…,Y_M)^T is the response. Then it has a Dirichlet distribution if (Y_1,…,Y_{M-1})^T has density

(Gamma(alpha_+) / prod_{j=1}^M gamma(alpha_j)) prod_{j=1}^M y_j^(alpha_j -1)

where alpha_+= alpha_1 + … + alpha_M, alpha_j > 0, and the density is defined on the unit simplex

Delta_M = { (y_1,…,y_M)^T : y_1 > 0, …, y_M > 0, ∑_{j=1}^M y_j = 1 }.

One has E(Y_j) = alpha_j / alpha_{+}, which are returned as the fitted values. For this distribution Fisher scoring corresponds to Newton-Raphson.

The Dirichlet distribution can be motivated by considering the random variables (G_1,…,G_M)^T which are each independent and identically distributed as a gamma distribution with density f(g_j)= g_j^(alpha_j - 1) e^(-g_j) / gamma(alpha_j). Then the Dirichlet distribution arises when Y_j = G_j / (G_1 + ... + G_M).

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

When fitted, the fitted.values slot of the object contains the M-column matrix of means.

Note

The response should be a matrix of positive values whose rows each sum to unity. Similar to this is count data, where probably a multinomial logit model (multinomial) may be appropriate. Another similar distribution to the Dirichlet is the Dirichlet-multinomial (see dirmultinomial).

Author(s)

Thomas W. Yee

References

Lange, K. (2002). Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

See Also

rdiric, dirmultinomial, multinomial, simplex.

Examples

1
2
3
4
5
6
7
ddata <- data.frame(rdiric(n = 1000,
                           shape = exp(c(y1 = -1, y2 = 1, y3 = 0))))
fit <- vglm(cbind(y1, y2, y3)  ~ 1, dirichlet,
            data = ddata, trace = TRUE, crit = "coef")
Coef(fit)
coef(fit, matrix = TRUE)
head(fitted(fit))

Example output

Loading required package: stats4
Loading required package: splines
VGLM    linear loop  1 :  coefficients = 
-1.72866678,  0.22201176, -0.71633762
VGLM    linear loop  2 :  coefficients = 
-1.23828490,  0.71571846, -0.24381143
VGLM    linear loop  3 :  coefficients = 
-1.036549214,  0.923255526, -0.042140867
VGLM    linear loop  4 :  coefficients = 
-1.008370303,  0.952299424, -0.013287197
VGLM    linear loop  5 :  coefficients = 
-1.007885368,  0.952799851, -0.012783106
VGLM    linear loop  6 :  coefficients = 
-1.007885227,  0.952799998, -0.012782957
VGLM    linear loop  7 :  coefficients = 
-1.007885227,  0.952799998, -0.012782957
   shape1    shape2    shape3 
0.3649900 2.5929598 0.9872984 
            loge(shape1) loge(shape2) loge(shape3)
(Intercept)    -1.007885       0.9528  -0.01278296
          y1        y2      y3
1 0.09251383 0.6572362 0.25025
2 0.09251383 0.6572362 0.25025
3 0.09251383 0.6572362 0.25025
4 0.09251383 0.6572362 0.25025
5 0.09251383 0.6572362 0.25025
6 0.09251383 0.6572362 0.25025

VGAM documentation built on Jan. 16, 2021, 5:21 p.m.