bnpControl: Control settings for the Bayesian Nonparametric priors for...

View source: R/FullConditionals.R

bnpControlR Documentation

Control settings for the Bayesian Nonparametric priors for infinite mixture models (or shrinkage priors for overfitted mixtures)

Description

Supplies a list of arguments for use in mcmc_IMIFA pertaining to the use of the Bayesian Nonparametric Pitman-Yor / Dirichlet process priors with the infinite mixture models "IMFA" and "IMIFA". Certain arguments related to the Dirichlet concentration parameter for the overfitted mixtures "OMFA" and "OMIFA" can be supplied in this manner also.

Usage

bnpControl(learn.alpha = TRUE,
           alpha.hyper = c(2L, 4L),
           discount = 0,
           learn.d = TRUE,
           d.hyper = c(1L, 1L),
           ind.slice = TRUE,
           rho = 0.75,
           trunc.G = NULL,
           kappa = 0.5,
           IM.lab.sw = TRUE,
           thresh = FALSE,
           exchange = FALSE,
           zeta = NULL,
           tune.zeta = list(...),
           ...)

Arguments

learn.alpha
For the "IMFA" and "IMIFA" methods:

A logical indicating whether the Pitman-Yor / Dirichlet process concentration parameter is to be learned (defaults to TRUE), or remain fixed for the duration of the chain. If being learned, a Ga(a, b) prior is assumed for alpha; updates take place via Gibbs sampling when discount is zero and via Metropolis-Hastings when discount > 0. If not being learned, alpha must be supplied.

In the special case of discount < 0, alpha must be supplied as a positive integer multiple of abs(discount); in this instance, learn.alpha is forced to TRUE and alpha is updated with the changing number of components as the positive integer.

For the "OMFA" and "OMIFA" methods:

A logical indicating whether the Dirichlet concentration parameter is to be learned (defaults to TRUE) or remain fixed for the duration of the chain. If being learned, a Ga(a, b * G) is assumed for alpha, where G is the number of mixture components range.G, and updates take place via Metropolis-Hastings. If not being learned alpha must be supplied.

alpha.hyper
For the "IMFA" and "IMIFA" methods:

A vector of length 2 giving hyperparameters for the prior on the Pitman-Yor / Dirichlet process concentration parameter alpha. If isTRUE(learn.alpha), these are shape and rate parameters of a Gamma distribution. Defaults to Ga(2, 4). Choosing a larger rate is particularly important, as it encourages clustering. The prior is shifted to have support on (-discount, Inf) when non-zero discount is supplied and remains fixed (i.e. learn.d=FALSE) or when learn.d=TRUE.

For the "OMFA" and "OMIFA" methods:

A vector of length 2 giving hyperparameters a and b for the prior on the Dirichlet concentration parameter alpha. If isTRUE(learn.alpha), these are shape and rate parameters of a Gamma distribution. Defaults to Ga(2, 4). Note that the supplied rate will be multiplied by range.G, to encourage clustering, such that the form of the prior is Ga(a, b * G).

discount

The discount parameter used when generalising the Dirichlet process to the Pitman-Yor process. Defaults to 0, but typically must lie in the interval [0, 1). If greater than zero, alpha can be supplied greater than -discount. By default, Metropolis-Hastings steps are invoked for updating this parameter via learn.d.

The special case of discount < 0 is allowed, in which case learn.d=FALSE is forced and alpha must be supplied as a positive integer multiple of abs(discount). Fixing discount > 0.5 is discouraged (see learn.alpha).

learn.d

Logical indicating whether the discount parameter is to be updated via Metropolis-Hastings (defaults to TRUE, unless discount is supplied as a negative value).

d.hyper

Hyperparameters for the Beta(a,b) prior on the discount parameter. Defaults to Beta(1,1), i.e. Uniform(0,1).

ind.slice

Logical indicating whether the independent slice-efficient sampler is to be employed (defaults, typically, to TRUE). If FALSE the dependent slice-efficient sampler is employed, whereby the slice sequence \xi_1,\ldots,\xi_g is equal to the decreasingly ordered mixing proportions. When thresh &/or exchange are set to TRUE (see below), this argument is forced to FALSE.

rho

Parameter controlling the rate of geometric decay for the independent slice-efficient sampler, s.t. \xi=(1-\rho)\rho^{g-1}. Must lie in the interval [0, 1). Higher values are associated with better mixing but longer run times. Defaults to 0.75, but 0.5 is an interesting special case which guarantees that the slice sequence \xi_1,\ldots,\xi_g is equal to the expectation of the decreasingly ordered mixing proportions. Only relevant when ind.slice is TRUE.

trunc.G

The maximum number of allowable and storable clusters under the "IMIFA" and "IMFA" models. The number of active clusters to be sampled at each iteration is adaptively truncated, with trunc.G as an upper limit for storage reasons. Defaults to max(min(N-1, 50), range.G)) and must satisfy range.G <= trunc.G < N. Note that large values of trunc.G may lead to memory capacity issues.

kappa

The spike-and-slab prior distribution on the discount hyperparameter is assumed to be a mixture with point-mass at zero and a continuous Beta(a,b) distribution. kappa gives the weight of the point mass at zero (the 'spike'). Must lie in the interval [0,1]. Defaults to 0.5. Only relevant when isTRUE(learn.d). A value of 0 ensures non-zero discount values (i.e. Pitman-Yor) at all times, and vice versa. Note that kappa will default to exactly 0 if alpha<=0 and learn.alpha=FALSE.

IM.lab.sw

Logical indicating whether the two forced label switching moves are to be implemented (defaults to TRUE) when running one of the infinite mixture models. Note: when exchange=TRUE (see below), this argument is instead forced to FALSE.

thresh

Logical indicating whether the threshold of Fall and Barat (2014) should be incorporated into the slice sampler. See the reference for details. This is an experimental feature (defaults to FALSE) and can work with or without exchange below. Setting thresh=TRUE is not recommended unless both learn.alpha and learn.d are FALSE. Setting thresh to TRUE also forces ind.slice to FALSE (see above).

exchange

Logical indicating whether the exchangeable slice sampler of Fall and Barat (2014) should be used instead. See the reference for details. This argument can work with or without thresh=TRUE above, though it is also an experimental argument and thus defaults to FALSE. When TRUE, the arguments ind.slice and IM.lab.sw (see above) are both forced to FALSE.

zeta
For the "IMFA" and "IMIFA" methods:

Tuning parameter controlling the acceptance rate of the random-walk proposal for the Metropolis-Hastings steps when learn.alpha=TRUE, where 2 * zeta gives the full width of the uniform proposal distribution. These steps are only invoked when either discount is non-zero and fixed or learn.d=TRUE, otherwise alpha is learned by Gibbs updates. Must be strictly positive (if invoked). Defaults to 2.

For the "OMFA" and "OMIFA" methods:

Tuning parameter controlling the standard deviation of the log-normal proposal for the Metropolis-Hastings steps when learn.alpha=TRUE. Must be strictly positive (if invoked). Defaults to 0.75.

tune.zeta

A list with the following named arguments, used for tuning zeta (which is either the width of the uniform proposal for the "IMFA" or "IMIFA" methods or the standard deviation of the log-normal proposal for the "OMFA" or "OMIFA" methods) for alpha, via diminishing Robbins-Monro type adaptation, when the alpha parameter is learned via Metropolis-Hastings steps:

heat

The initial adaptation intensity/step-size, such that larger values lead to larger updates. Must be strictly greater than zero. Defaults to 1 if not supplied but other elements of tune.zeta are.

lambda

Iteration rescaling parameter which controls the speed at which adaptation diminishes, such that lower values cause the contribution of later iterations to diminish more slowly. Must lie in the interval (0.5, 1]. Defaults to 1 if not supplied but other elements of tune.zeta are.

target

The target acceptance rate. Must lie in the interval [0, 1]. Defaults to 0.441, which is optimal for univariate targets, if not supplied but other elements of tune.zeta are.

start.zeta

The iteration at which diminishing adaptation begins. Defaults to 100.

stop.zeta

The iteration at which diminishing adaptation is to stop completely. Defaults to Inf, such that diminishing adaptation is never explicitly made to stop. Must be greater than start.zeta.

At least one tune.zeta argument must be supplied for diminishing adaptation to be invoked. tune.zeta arguments are only relevant when learn.alpha is TRUE (and, for the "IMFA" and "IMIFA" methods, when either of the following is also true: the discount remains fixed at a non-zero value, or when learn.d is TRUE and kappa < 1). Since Gibbs steps are invoked for updating alpha when discount == 0 under the "IMFA" or "IMIFA" methods, adaption occurs according to a running count of the number of iterations with non-zero sampled discount values for those methods. As such, when a mix of Gibbs and MH updates are used, this tuning only targets the target acceptance rates for the MH steps; i.e. acceptances under the Gibbs framework will inflate the acceptance rate further.

If diminishing adaptation is invoked, the posterior mean zeta will be stored. Since caution is advised when employing adaptation, note that acceptance rates of between 10-50% are generally considered adequate.

...

Catches unused arguments.

Details

The crucial concentration parameter alpha is documented within the main mcmc_IMIFA function, and is relevant to all of the "IMIFA", "IMFA", "OMIFA", and "OMFA" methods.

All arguments here are relevant to the "IMFA" and "IMIFA" methods, but the following are also related to the "OMFA" and "OMIFA" methods, and may behave differently in those instances: learn.alpha, alpha.hyper, zeta, and tune.zeta.

Value

A named list in which the names are the names of the arguments related to the BNP prior(s) and the values are the values supplied to the arguments.

Note

Certain supplied arguments will be subject to further checks within mcmc_IMIFA. G_priorDensity and G_moments can help with soliciting sensible DP/PYP priors.

Under the "IMFA" and "IMIFA" methods, a Pitman-Yor process prior is specified by default. A Dirichlet process prior can be easily invoked when the discount is fixed at 0 and learn.d=FALSE. The normalized stable process can also be specified as a prior distribution, as a special case of the Pitman-Yor process, when alpha remains fixed at 0 and learn.alpha=FALSE (provided the discount is fixed at a strictly positive value or learn.d=TRUE). The special case of the Pitman-Yor process with negative discount is also allowed as an experimental feature for which caution is advised, though learn.d and learn.alpha are forced to FALSE and TRUE, respectively, in this instance.

Author(s)

Keefe Murphy - <keefe.murphy@mu.ie>

References

Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/19-BA1179")}>.

Kalli, M., Griffin, J. E. and Walker, S. G. (2011) Slice sampling mixture models, Statistics and Computing, 21(1): 93-105.

Fall, M. D. and Barat, E. (2014) Gibbs sampling methods for Pitman-Yor mixture models, hal-00740770v2.

See Also

mcmc_IMIFA, G_priorDensity, G_moments, mixfaControl, mgpControl, storeControl

Examples

bnpctrl <- bnpControl(learn.d=FALSE, ind.slice=FALSE, alpha.hyper=c(3, 3))

# data(olive)
# sim   <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, BNP=bnpctrl)

# Alternatively specify these arguments directly
# sim   <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, learn.d=FALSE,
#                     ind.slice=FALSE, alpha.hyper=c(3, 3))

Keefe-Murphy/IMIFA documentation built on Jan. 31, 2024, 2:15 p.m.