G_moments: 1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes

G_momentsR Documentation

1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes

Description

Calculate the a priori expected number of clusters (G_expected) or the variance of the number of clusters (G_variance) under a PYP or DP prior for a sample of size N at given values of the concentration parameter alpha and optionally also the Pitman-Yor discount parameter. Useful for soliciting sensible priors (or fixed values) for alpha or discount under the "IMFA" and "IMIFA" methods for mcmc_IMIFA. Additionally, for a given sample size N and given expected number of clusters EG, G_calibrate elicits a value for the concentration parameter alpha or the discount parameter.

Usage

G_expected(N,
           alpha,
           discount = 0,
           MPFR = TRUE)

G_variance(N,
           alpha,
           discount = 0,
           MPFR = TRUE)

G_calibrate(N,
            EG,
            alpha = NULL,
            discount = 0,
            MPFR = TRUE,
            ...)

Arguments

N

The sample size.

alpha

The concentration parameter. Must be specified (though not for G_calibrate) and must be strictly greater than -discount. The case alpha=0 is accommodated. When discount is negative alpha must be a positive integer multiple of abs(discount). See Details for behaviour for G_calibrate.

discount

The discount parameter for the Pitman-Yor process. Must be less than 1, but typically lies in the interval [0, 1). Defaults to 0 (i.e. the Dirichlet process). When discount is negative alpha must be a positive integer multiple of abs(discount). See Details for behaviour for G_calibrate.

MPFR

Logical indicating whether the high-precision libraries Rmpfr and gmp are invoked, at the expense of run-time. Defaults to TRUE and must be TRUE for G_expected when alpha=0 or G_variance when discount is non-zero. For G_calibrate, it is strongly recommended to use MPFR=TRUE when discount is non-zero and strictly necessary when alpha=0 is supplied. See Note.

EG

The prior expected number of clusters. Must exceed 1 and be less than N.

...

Additional arguments passed to uniroot, e.g. maxiter.

Details

All arguments are vectorised. Users can also consult G_priorDensity in order to solicit sensible priors.

For G_calibrate, only one of alpha or discount can be supplied, and the function elicits a value for the opposing parameter which achieves the desired expected number of clusters EG for the given sample size N. By default, a value for alpha subject to discount=0 (i.e. the Dirichlet process) is elicited. Note that alpha may not be a positive integer multiple of discount as it should be if discount is negative. See Examples below.

Value

The expected number of clusters under the specified prior conditions (G_expected), or the variance of the number of clusters (G_variance), or the concentration parameter alpha or discount parameter achieving a particular expected number of clusters (G_calibrate).

Note

G_variance requires use of the Rmpfr and gmp libraries for non-zero discount values. G_expected requires these libraries only for the alpha=0 case. These libraries are strongly recommended (but they are not required) for G_calbirate when discount is non-zero, but they are required when alpha=0 is supplied. Despite the high precision arithmetic used, the functions can still be unstable for large N and/or extreme values of alpha and/or discount. See the argument MPFR.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prunster, I., and Ruggiero, M. (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.

Yamato, H. and Shibuya, M. (2000) Moments of some statistics of Pitman sampling formula, Bulletin of Informatics and Cybernetics, 32(1): 1-10.

See Also

G_priorDensity, Rmpfr, uniroot

Examples

# Certain examples require the use of the Rmpfr library
suppressMessages(require("Rmpfr"))

G_expected(N=50, alpha=19.23356, MPFR=FALSE)
G_variance(N=50, alpha=19.23356, MPFR=FALSE)

G_expected(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=FALSE)
G_variance(N=50, alpha=c(19.23356, 12.21619, 1),
           discount=c(0, 0.25, 0.7300045), MPFR=c(FALSE, TRUE, TRUE))

# Examine the growth rate of the DP
DP   <- sapply(c(1, 5, 10), function(i) G_expected(1:200, alpha=i, MPFR=FALSE))
matplot(DP, type="l", xlab="N", ylab="G")

# Examine the growth rate of the PYP
PY <- sapply(c(0.25, 0.5, 0.75), function(i) G_expected(1:200, alpha=1, discount=i))
matplot(PY, type="l", xlab="N", ylab="G")

# Other special cases of the PYP are also facilitated
G_expected(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))
G_variance(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))

# Elicit values for alpha under a DP prior
G_calibrate(N=50, EG=25)

# Elicit values for alpha under a PYP prior
# G_calibrate(N=50, EG=25, discount=c(-27.1401/100, 0.25, 0.7300045))
# Elicit values for discount under a PYP prior
# G_calibrate(N=50, EG=25, alpha=c(12.21619, 1, 0), maxiter=2000)

Keefe-Murphy/IMIFA documentation built on Jan. 31, 2024, 2:15 p.m.