# Categorical: Categorical distribution In extraDistr: Additional Univariate and Multivariate Distributions

## Description

Probability mass function, distribution function, quantile function and random generation for the categorical distribution.

## Usage

 ```1 2 3 4 5 6 7 8 9``` ```dcat(x, prob, log = FALSE) pcat(q, prob, lower.tail = TRUE, log.p = FALSE) qcat(p, prob, lower.tail = TRUE, log.p = FALSE, labels) rcat(n, prob, labels) rcatlp(n, log_prob, labels) ```

## Arguments

 `x, q` vector of quantiles. `prob, log_prob` vector of length m, or m-column matrix of non-negative weights (or their logarithms in `log_prob`). `log, log.p` logical; if TRUE, probabilities p are given as log(p). `lower.tail` logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]. `p` vector of probabilities. `labels` if provided, labeled `factor` vector is returned. Number of labels needs to be the same as number of categories (number of columns in prob). `n` number of observations. If `length(n) > 1`, the length is taken to be the number required.

## Details

Probability mass function

Pr(X = k) = w[k]/sum(w)

Cumulative distribution function

Pr(X <= k) = sum(w[1:k])/sum(w)

It is possible to sample from categorical distribution parametrized by vector of unnormalized log-probabilities α,...,α[m] without leaving the log space by employing the Gumbel-max trick (Maddison, Tarlow and Minka, 2014). If g,...,g[m] are samples from Gumbel distribution with cumulative distribution function F(g) = exp(-exp(-g)), then k = argmax(g[i]+α[i]) is a draw from categorical distribution parametrized by vector of probabilities p....,p[m], such that p[i] = exp(α[i])/sum(exp(α)). This is implemented in `rcatlp` function parametrized by vector of log-probabilities `log_prob`.

## References

Maddison, C. J., Tarlow, D., & Minka, T. (2014). A* sampling. [In:] Advances in Neural Information Processing Systems (pp. 3086-3094). https://arxiv.org/abs/1411.0030

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29``` ```# Generating 10 random draws from categorical distribution # with k=3 categories occuring with equal probabilities # parametrized using a vector rcat(10, c(1/3, 1/3, 1/3)) # or with k=5 categories parametrized using a matrix of probabilities # (generated from Dirichlet distribution) p <- rdirichlet(10, c(1, 1, 1, 1, 1)) rcat(10, p) x <- rcat(1e5, c(0.2, 0.4, 0.3, 0.1)) plot(prop.table(table(x)), type = "h") lines(0:5, dcat(0:5, c(0.2, 0.4, 0.3, 0.1)), col = "red") p <- rdirichlet(1, rep(1, 20)) x <- rcat(1e5, matrix(rep(p, 2), nrow = 2, byrow = TRUE)) xx <- 0:21 plot(prop.table(table(x))) lines(xx, dcat(xx, p), col = "red") xx <- seq(0, 21, by = 0.01) plot(ecdf(x)) lines(xx, pcat(xx, p), col = "red", lwd = 2) pp <- seq(0, 1, by = 0.001) plot(ecdf(x)) lines(qcat(pp, p), pp, col = "red", lwd = 2) ```

extraDistr documentation built on Sept. 7, 2020, 5:09 p.m.