bayesPiVector: Baysian estimation of a multinomial proportion vector...
In tmcd82070/evoab: Evidence of Absence (EoA)

Description Usage Arguments Details Value Author(s) See Also Examples

This routine assumes you have an observation from a multinomial distribution with k classes (k >= 2, parameter x) and have assumed that the multinomial distribution's proportion vector ("pi vector") follows a Dirichelet distribution. If so, this routine estimates the proportion vector's posterior distribution mean, variance, and mode.

1	bayesPiVector(x, pseudoCounts = rep(1, length(x))/length(x))

`x`	An integer vector containing the number of observed 'successes' in each catagory of the multinomial. Total number of trials is `sum(x)`. The number of catagories is K = `length(x)`.
`pseudoCounts`	A vector of real-valued "pseudo counts" for the K catagories in the problem. This is sometimes called the "concentration" parameter.

Computations are elementary because the Dirichlet(a1, a2, ..., aK) prior is conjugate for the multinomial. Nearly every text on Bayesian estimation shows that given values for x and pseudoCounts, the posterior distribution of the mulitinomial's p vector is,

Dirichlet(x1+a1, x2+a2, ..., xk+ak).

Hence, the Bayes point estimator of the multinomial's proportions is,

phat_i = (xi+ai) / sum(xi + ai),

which is the mean of the posterior. Standard error of the posterior is,

se.phat_i=sqrt((xi+ai)*(A-ai)/(A^2*(A+1))).

where A = sum(xi + ai). If (xi+ai)>1 for all i, mode of the posterior for the proportion vector is,

(xi+ai-1)/(A-K).

The default value for pseudoCounts corresponds to the Jeffery's prior. The Jeffery's prior is proportional to the root of Fisher's information and is equal to Dirichlet(1/K,1/K, ..., 1/K).

A data frame with number of rows equal to length(x) containing the Baysian point estimates for the proportion in each catagory. The data frame has the following columns:

phat : the Bayes point estimates equal to the mean vector of the posterior distribution. This column sums to 1.0
phat.mode : if xi+ai > 1 for all i, this column contains the mode vector of the posterior. Mode vector is the most vector of proportions with maximum likelihood. If any xi+ai < 1, phat.mode = NA.
se.phat : the standard error vector of the posterior distribution.
psuedoCounts : the vector of pseudoCounts associated with the Dirichlet posterior. This vector can be used to accumulate counts over muliple calls.

Trent McDonald

agrestiCoullPhat

bayesPiVector(c(1,5), c(.5,.5))  # Jeffery's prior
bayesPiVector(c(1,5), c(1, 1)) # flat prior

# When prior data is available:
x.prior <- 5
n.prior <- 100
bayesPiVector(c(1,5), c(x.prior+0.5, n.prior-x.prior+0.5))

# Simulation: point est bias and ci coverage
trueP <- c(0.01, 0.04, 0.95)
n <- 20
x <- rbinom( 1000, n, trueP)
baPhat <- apply(x, 1, bayesPiVector, pseudoCounts=rep(1,3)/3 )
muBA <- mean(baPhat$phat)