View source: R/psmMergeSplit.R
psmMergeSplit | R Documentation |
Merge-split proposals for conjugate "Chinese Restaurant Process" (CRP) mixture models using sequentially-allocated elements. Allocation is performed with weights derived from a previously-calculated pairwise similarity matrix, and optionally complemented with "restricted Gibbs" scans as discussed in Jain & Neal (2004).
psmMergeSplit( partition, psm, logPosteriorPredictiveDensity = function(i, subset) 0, t = 1, mass = 1, discount = 0, nUpdates = 1L, selectionWeights = NULL )
partition |
A numeric vector of cluster labels representing the current partition. |
psm |
A matrix of previously-calculated pairwise similarity probabilities for each pair of data indices. |
logPosteriorPredictiveDensity |
A function taking an index i (as a numeric vector of length one) and a subset of integers subset, and returning the natural logarithm of p( y_i | y_subset ), i.e., that item's contribution to the log integrated likelihood given a subset of the other items. The default value "turns off" the likelihood, resulting in prior simulation (rather than posterior simulation). |
t |
A non-negative integer indicating the number of restricted Gibbs scans to perform for each merge/split proposal. |
mass |
A specification of the mass (concentration) parameter in the CRP
prior. Must be greater than the |
discount |
A numeric value on the interval [0,1) corresponding to the discount parameter in the two-parameter CRP prior. |
nUpdates |
An integer giving the number of merge-split proposals before returning. This has the effect of thinning the Markov chain. |
selectionWeights |
A matrix or data frame whose first two columns are the unique pairs of data indices, along with a column of weights representing how likely each pair is to be selected at the beginning of each merge-split update. |
A numeric vector giving the updated partition encoded using cluster labels.
The acceptance rate
of the Metropolis-Hastings proposals, i.e. the number of accepted proposals
divided by nUpdates
.
Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of computational and Graphical Statistics, 13(1), 158-182.
# Neal (2000) model and data nealData <- c(-1.48, -1.40, -1.16, -1.08, -1.02, 0.14, 0.51, 0.53, 0.78) mkLogPosteriorPredictiveDensity <- function(data = nealData, sigma2 = 0.1^2, mu0 = 0, sigma02 = 1) { function(i, subset) { posteriorVariance <- 1 / ( 1/sigma02 + length(subset)/sigma2 ) posteriorMean <- posteriorVariance * ( mu0/sigma02 + sum(data[subset])/sigma2 ) posteriorPredictiveSD <- sqrt(posteriorVariance + sigma2) dnorm(data[i], posteriorMean, posteriorPredictiveSD, log=TRUE) } } logPostPredict <- mkLogPosteriorPredictiveDensity() nSamples <- 1100L nBurn <- 100 partitions <- matrix(0, nrow=nSamples, ncol=length(nealData)) # initial draws to inform similarity matrix for ( i in 2:nBurn ) { partitions[i,] <- nealAlgorithm3(partitions[i-1,], logPostPredict, mass = 1, nUpdates = 1) } # Generate pairwise similarity matrix from initial draws psm.mat <- psm(partitions[1:nBurn,]) accept <- 0 for ( i in (nBurn+1):nSamples ) { ms <- psmMergeSplit(partitions[i-1,], psm.mat, logPostPredict, t = 1, mass = 1.0, nUpdates = 1) partitions[i,] <- ms$partition accept <- accept + ms$accept } accept / (nSamples - nBurn) # post burn-in M-H acceptance rate nSubsets <- apply(partitions, 1, function(x) length(unique(x))) mean(nSubsets) sum(acf(nSubsets)$acf)-1 # Autocorrelation time entropy <- apply(partitions, 1, partitionEntropy) plot.ts(entropy)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.