sampleFmm: Sample the posterior pdf
In USGS-R/GcClust: Clustering geochemical data with HMC sampling

sampleFmm

R Documentation

Sample the posterior pdf

Description

Sample the posterior probability density function for the Bayesian formulation of the finite mixture model. The model implementation and the sampling are performed with package rstan.

Usage

sampleFmm(transData, nPCs, sm, priorParams, nWuSamples = 500,
  nPwuSamples = 500, nChainsPerCore = 2, nCores = parallel::detectCores(),
  procDir = ".")

Arguments

`transData`	List containing the transformed geochemical concentrations and related information. This list is return by function `transformGcData`, for which the documentation includes a complete description of container `transData`.
`nPCs`	Number of principal components that are used in the finite mixture model (See Details).
`sm`	S4 class stanmodel containing the compiled, stan model.
`priorParams`	Parameters for the prior pdfs (See Details)
`nWuSamples`	Number of warm-up samples in each chain (See details).
`nPwuSamples`	Number of post warm-up samples in each chain (See details).
`nChainsPerCore`	Number of chains that each core computes (See Details).
`nCores`	Number of central processing unit (cpu) cores that are used (See Details).
`procDir`	Directory into which the results are written (See Details).

Details

The parameters in the finite mixture model are estimated from the robust principal components, which are stored as a matrix within transData. The number of principal components, nPCs, means that principal components 1, 2, ..., nPCs are used in the finite mixture model. That is, all principal components greater than nPCs are not used.

Argument priorParams is a vector with 4 elements.

The first element pertains to prior pdf for theta, which is the proportion in the finite mixture model. The prior pdf is a beta pdf, which has two shape parameters. The shape parameters must be equal so that the beta pdf is symmetric with respect to 0.5. Also, the shape parameters must be greater than 1, so that the pdf has a peak at 0.5 and equals 0 at both 0 and 1. Both shape parameters are specified by the single value, priorParams[1].
The second element pertains to the prior pdf for the elements of mu1 and mu2, which are the mean vectors in the finite mixture model. The prior pdf is a normal pdf, for which the mean is 0 and the standard deviation is specified by priorParams[2]. Of course, priorParams[2] must be greater than 0.
The third element pertains to the prior pdf for the elements of tau1 and tau2, which are the standard deviation vectors in the finite mixture model. The prior pdf is a truncated Cauchy pdf: The Cauchy pdf before truncation has a center of 0 and a scale specified by priorParams[3]. The truncation point is 0, which removes negative values of the random variable. Of course, priorParams[3] must be greater than 0.
The fourth element pertains to the prior pdf for Omega1 and Omega2, which are the correlation matrices in the finite mixture model. The prior pdf is the LKJ correlation distribution, which has one shape parameter. When the shape parameter is greater than 1, the LKJ correlation distribution has a mode corresponding to the identity matrix; as the shape parameter increases, the LKJ correlation distribution becomes increasingly concentrated about this mode. The chosen shape parameter always should be greater than 1. The shape parameter is specified by priorParams[4].

Appropriate values for the elements of priorParams are selected by examining the distribution of the principal components.

The sampling of the posterior probability density function (pdf) has two parts: warm-up and post warmup. The number of samples during warm-up is nWuSamples, and the number of samples during post warm-up is nPwuSamples.

The sampling is performed in parallel on multiple central processing unit (cpu) cores. Of course, the number of requested cores nCores must be less than or equal to the actual number. The parallel computations are organized in the following manner: nCores cores are requested from the operating system; each core computes nChainsPerCore individual chains, and each chain is written to its own file in directory procDir.

The sampling method is Hamiltonian Monte Carlo sampling. If the sampling has not converged using the default value for argument nWuSamples, then this argument should be increased. A suitable value might be 2500.

Value

The returned values may be conveniently divided into two groups. The first group comprises the stanfit objects that contain the sampling chains. There is one stanfit object for each chain; the format of the stanfit object is described in the rstan documentation. The samples within an object are of the following model parameters:

Parameter	Description
theta	Proportion of population associated with pdf 1.
mu1	Mean vector for pdf 1.
mu2	Mean vector for pdf 2.
tau1	Standard deviation vector for pdf 1.
tau2	Standard deviation vector for pdf 2.
L_Omega1	Cholesky decomposition of the correlation matrix for pdf 1.
L_Omega2	Cholesky decomposition of the correlation matrix for pdf 2.
log_lik	The logarithm of the likelihood function.

Each stanfit object is written to its own file in directory procDir. A file name has the form "RawSamples?-?.dat" for which the first question mark is replaced by the core number and the the second question mark replaced by the chain number for the specified core. The important point is that the file name is unique.

The second group is a list with 4 elements that is returned by function sampleFmm:

`nChains`	Number of chains.
`nWuSamples`	Number of warm-up samples in each chain.
`nPwuSamples`	Number of post warm-up samples in each chain.
`fileNames`	Vector containing the names of the files with the `stanfit` objects.

The total number of samples per chain equals nWuSamples plus nPwuSamples. Only the post warm-up samples are used for inference.

References

Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B., 2014, Bayesian data analysis (3rd ed.), CRC Press.

Marin, J.M., Mengersen, K. and Robert, C.P., 2005, Bayesian modelling and inference on mixtures of distributions, in Handbook of Statistics 25, D. Dey and C.R. Rao (eds.), Elsevier-Sciences.

Stan Development Team, 2015, Stan Modeling Language - Users Guide and Reference Manual, Version 2.6.0, available on line at http://mc-stan.org/ (last accessed October 2015).

Examples

## Not run: 
tmp <- normalizePath(path.package("GcClust"))
load(paste(tmp, "\\stan\\MixtureModel.bin", sep=""))

samplePars <- sampleFmm(transData, nPCs, sm, priorParams )

## End(Not run)

USGS-R/GcClust documentation built on April 17, 2023, 8:08 p.m.