View source: R/GcClusterFunctions.R
sampleFmm | R Documentation |
Sample the posterior probability density function for the Bayesian formulation of the finite mixture model. The model implementation and the sampling are performed with package rstan.
sampleFmm(transData, nPCs, sm, priorParams, nWuSamples = 500,
nPwuSamples = 500, nChainsPerCore = 2, nCores = parallel::detectCores(),
procDir = ".")
transData |
List containing the transformed geochemical
concentrations and related information.
This list is return by function |
nPCs |
Number of principal components that are used in the finite mixture model (See Details). |
sm |
S4 class stanmodel containing the compiled, stan model. |
priorParams |
Parameters for the prior pdfs (See Details) |
nWuSamples |
Number of warm-up samples in each chain (See details). |
nPwuSamples |
Number of post warm-up samples in each chain (See details). |
nChainsPerCore |
Number of chains that each core computes (See Details). |
nCores |
Number of central processing unit (cpu) cores that are used (See Details). |
procDir |
Directory into which the results are written (See Details). |
The parameters in the finite mixture model are estimated
from the robust principal components, which are stored as a matrix within
transData
. The number of principal components, nPCs
,
means that
principal components 1, 2, ..., nPCs are used in the finite mixture model.
That is, all principal components greater than nPCs are not used.
Argument priorParams is a vector with 4 elements.
The first element pertains to prior pdf for theta, which is the proportion in the finite mixture model. The prior pdf is a beta pdf, which has two shape parameters. The shape parameters must be equal so that the beta pdf is symmetric with respect to 0.5. Also, the shape parameters must be greater than 1, so that the pdf has a peak at 0.5 and equals 0 at both 0 and 1. Both shape parameters are specified by the single value, priorParams[1].
The second element pertains to the prior pdf for the elements of mu1 and mu2, which are the mean vectors in the finite mixture model. The prior pdf is a normal pdf, for which the mean is 0 and the standard deviation is specified by priorParams[2]. Of course, priorParams[2] must be greater than 0.
The third element pertains to the prior pdf for the elements of tau1 and tau2, which are the standard deviation vectors in the finite mixture model. The prior pdf is a truncated Cauchy pdf: The Cauchy pdf before truncation has a center of 0 and a scale specified by priorParams[3]. The truncation point is 0, which removes negative values of the random variable. Of course, priorParams[3] must be greater than 0.
The fourth element pertains to the prior pdf for Omega1 and Omega2, which are the correlation matrices in the finite mixture model. The prior pdf is the LKJ correlation distribution, which has one shape parameter. When the shape parameter is greater than 1, the LKJ correlation distribution has a mode corresponding to the identity matrix; as the shape parameter increases, the LKJ correlation distribution becomes increasingly concentrated about this mode. The chosen shape parameter always should be greater than 1. The shape parameter is specified by priorParams[4].
Appropriate values for the elements of priorParams
are selected
by examining the distribution of the principal components.
The sampling of the posterior probability density function (pdf)
has two parts: warm-up and post warmup. The number of samples
during warm-up is nWuSamples
, and the number of samples during post
warm-up is nPwuSamples
.
The sampling is performed in parallel on multiple central
processing unit (cpu) cores. Of course, the number of requested cores
nCores
must be less than or equal to the actual number. The
parallel computations are organized in the following manner:
nCores
cores are requested from the operating system; each core
computes nChainsPerCore
individual chains, and each chain is written
to its own file in directory procDir
.
The sampling method is Hamiltonian Monte Carlo sampling. If the sampling
has not converged using the default value for argument nWuSamples
,
then this argument should be increased. A suitable value might be 2500.
The returned values may be conveniently divided into two groups.
The first group comprises the stanfit
objects that contain the
sampling chains. There is one stanfit
object for each chain; the
format of the stanfit
object is described in the rstan documentation.
The samples within an object are of the following model parameters:
Parameter | Description |
theta | Proportion of population associated with pdf 1. |
mu1 | Mean vector for pdf 1. |
mu2 | Mean vector for pdf 2. |
tau1 | Standard deviation vector for pdf 1. |
tau2 | Standard deviation vector for pdf 2. |
L_Omega1 | Cholesky decomposition of the correlation matrix for pdf 1. |
L_Omega2 | Cholesky decomposition of the correlation matrix for pdf 2. |
log_lik | The logarithm of the likelihood function. |
Each stanfit
object is written to its own file in
directory procDir
.
A file name has the form "RawSamples?-?.dat" for which the first
question mark is replaced by the core number and the the second question
mark replaced by the chain number for the specified core.
The important point is that the file name is unique.
The second group is a list with 4 elements that is returned by function
sampleFmm
:
nChains |
Number of chains. |
nWuSamples |
Number of warm-up samples in each chain. |
nPwuSamples |
Number of post warm-up samples in each chain. |
fileNames |
Vector containing the names of the files
with the |
The total number of samples per chain equals nWuSamples
plus
nPwuSamples
. Only the post warm-up samples are used for inference.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B., 2014, Bayesian data analysis (3rd ed.), CRC Press.
Marin, J.M., Mengersen, K. and Robert, C.P., 2005, Bayesian modelling and inference on mixtures of distributions, in Handbook of Statistics 25, D. Dey and C.R. Rao (eds.), Elsevier-Sciences.
Stan Development Team, 2015, Stan Modeling Language - Users Guide and Reference Manual, Version 2.6.0, available on line at http://mc-stan.org/ (last accessed October 2015).
## Not run:
tmp <- normalizePath(path.package("GcClust"))
load(paste(tmp, "\\stan\\MixtureModel.bin", sep=""))
samplePars <- sampleFmm(transData, nPCs, sm, priorParams )
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.