Categorical: Categorical distribution

Description Usage Arguments Details References Examples

Description

Probability mass function, distribution function, quantile function and random generation for the categorical distribution.

Usage

1
2
3
4
5
6
7
8
9
dcat(x, prob, log = FALSE)

pcat(q, prob, lower.tail = TRUE, log.p = FALSE)

qcat(p, prob, lower.tail = TRUE, log.p = FALSE, labels)

rcat(n, prob, labels)

rcatlp(n, log_prob, labels)

Arguments

x, q

vector of quantiles.

prob, log_prob

vector of length m, or m-column matrix of non-negative weights (or their logarithms in log_prob).

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

p

vector of probabilities.

labels

if provided, labeled factor vector is returned. Number of labels needs to be the same as number of categories (number of columns in prob).

n

number of observations. If length(n) > 1, the length is taken to be the number required.

Details

Probability mass function

Pr(X = k) = w[k]/sum(w)

Cumulative distribution function

Pr(X <= k) = sum(w[1:k])/sum(w)

It is possible to sample from categorical distribution parametrized by vector of unnormalized log-probabilities α[1],...,α[m] without leaving the log space by employing the Gumbel-max trick (Maddison, Tarlow and Minka, 2014). If g[1],...,g[m] are samples from Gumbel distribution with cumulative distribution function F(g) = exp(-exp(-g)), then k = argmax(g[i]+α[i]) is a draw from categorical distribution parametrized by vector of probabilities p[1]....,p[m], such that p[i] = exp(α[i])/sum(exp(α)). This is implemented in rcatlp function parametrized by vector of log-probabilities log_prob.

References

Maddison, C. J., Tarlow, D., & Minka, T. (2014). A* sampling. [In:] Advances in Neural Information Processing Systems (pp. 3086-3094). https://arxiv.org/abs/1411.0030

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Generating 10 random draws from categorical distribution
# with k=3 categories occuring with equal probabilities
# parametrized using a vector

rcat(10, c(1/3, 1/3, 1/3))

# or with k=5 categories parametrized using a matrix of probabilities
# (generated from Dirichlet distribution)

p <- rdirichlet(10, c(1, 1, 1, 1, 1))
rcat(10, p)

x <- rcat(1e5, c(0.2, 0.4, 0.3, 0.1))
plot(prop.table(table(x)), type = "h")
lines(0:5, dcat(0:5, c(0.2, 0.4, 0.3, 0.1)), col = "red")

p <- rdirichlet(1, rep(1, 20))
x <- rcat(1e5, matrix(rep(p, 2), nrow = 2, byrow = TRUE))
xx <- 0:21
plot(prop.table(table(x)))
lines(xx, dcat(xx, p), col = "red")

xx <- seq(0, 21, by = 0.01)
plot(ecdf(x))
lines(xx, pcat(xx, p), col = "red", lwd = 2)

pp <- seq(0, 1, by = 0.001)
plot(ecdf(x))
lines(qcat(pp, p), pp, col = "red", lwd = 2)

extraDistr documentation built on Sept. 7, 2020, 5:09 p.m.