evolMonteCarloClustering: evolutionary Monte Carlo clustering algorithm
In EMCC: Evolutionary Monte Carlo (EMC) Methods for Clustering

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/doEMCC.R

Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.

Below sampDim refers to the dimension of the sample space, temperLadderLen refers to the length of the temperature ladder, and levelsSaveSampForLen refers to the length of the levelsSaveSampFor.

evolMonteCarloClustering(nIters,              
                         temperLadder,              
                         startingVals,              
                         logTarDensFunc,            
                         MHMergeProb       = 0.5,  
                         moveProbsList     = NULL,  
                         moveNTimesList    = NULL,  
                         levelsSaveSampFor = NULL,  
                         saveFitness       = FALSE, 
                         verboseLevel      = 0,     
                         ...)

`nIters`	`integer` > 0.
`temperLadder`	`double` vector with all positive entries, in decreasing order.
`startingVals`	`double` matrix of dimension `temperLadderLen` x `sampDim` or vector of length `sampDim`, in which case the same starting values are used for every temperature level.
`logTarDensFunc`	`function` of two arguments `(draw, ...)` that returns the target density evaluated in the log scale.
`MHMergeProb`	`double` in (0, 1). See details below for the use of this argument.
`moveProbsList`	named `list` of probabilities adding upto 1.
`moveNTimesList`	named `list` of integers >= 0.
`levelsSaveSampFor`	`integer` vector with positive entries.
`saveFitness`	`logical`.
`verboseLevel`	`integer`, a value >= 2 produces a lot of output.
`...`	optional arguments to be passed to `logTarDensFunc`.

The EMCC algorithm: The evolutionary Monte Carlo clustering (EMCC; Goswami and Liu, 2007) algorithm is composed of the following moves:

MH	Metropolis-Hastings or mutation
SCSC_ONE_NEW	sub-cluster swap crossover: one new
SCSC_TWO_NEW	sub-cluster swap crossover: two new
SCRC	sub-cluster reallocation crossover
RE	(random) exchange

The current function could be used to run the EMCC algorithm by specifying what moves to employ using the following variables.

moveProbsList and moveNTimesList: The allowed names for components of moveProbsList and moveNTimesList come from the abbreviated names of the moves above. For example, the following specifications are valid:

moveProbsList = list(MH           = 0.5,
                     SCSC_TWO_NEW = 0.25,
                     SCRC         = 0.25)

moveNTimesList = list(MH           = 1, 
                      SCSC_TWO_NEW = floor(temperLadderLen / 2),
                      SCRC         = floor(temperLadderLen / 2),
                      RE           = temperLadderLen)

MHMergeProb

In the MH or the mutation step, each of the sampDim-many objects are proposed to either merge with an existing cluster or split to form its own cluster with probability MHMergeProb and (1 - MHMergeProb), respectively (see Goswami and Liu, 2007).

levelsSaveSampFor

By default, samples are saved and returned for temperature level temperLadderLen. The levelsSaveSampFor could be used to save samples from other temperature levels as well (e.g., levelsSaveSampFor = 1:temperLadderLen saves samples from all levels).

saveFitness

The term fitness refers to the function H(x), where the target density of interest is given by:

g(x) \propto \exp[ -H(x) / τ_{min} ]

H(x) is also known as the energy function. By default, the fitness values are not saved, but one can do so by setting saveFitness = TRUE.

This function returns a list with the following components:

`draws`	`array` of dimension `nIters` x `sampDim` x `levelsSaveSampForLen`, if `saveFitness = FALSE`. If `saveFitness = TRUE`, then the returned array is of dimension `nIters` x `(sampDim + 1)` x `levelsSaveSampForLen`; i.e., each of the `levelsSaveSampForLen` matrices contain the fitness values in their last column.
`acceptRatios`	`matrix` of the acceptance rates for various moves used.
`detailedAcceptRatios`	`list` of matrices with detailed summary of the acceptance rates for various moves used.
`nIters`	the `nIters` argument.
`temperLadder`	the `temperLadder` argument.
`startingVals`	the `startingVals` argument.
`moveProbsList`	the `moveProbsList` argument.
`moveNTimesList`	the `moveNTimesList` argument.
`levelsSaveSampFor`	the `levelsSaveSampFor` argument.
`time`	the time taken by the run.

The effect of leaving the default value NULL for some of the arguments above are as follows:

`moveProbsList`	`list(MH = 0.5, RC = 0.25, 'SCSC_TWO_NEW' = 0.25)`.
`moveNTimesList`	`list(MH = 1, RC = mm, 'SCSC_TWO_NEW' = mm, RE = nn)`,
	where `mm <- floor(nn / 2)` and `nn <- temperLadderLen`.
`levelsSaveSampFor`	`temperLadderLen`.

Gopi Goswami goswami@stat.harvard.edu

Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.

## The following example is a simple stochastic optimization problem,
## the set up is same as that of findMaxTemper and placeTempers. Here
## no "heating up" is necessary, and hence the maximum temprature is
## the coldest one, namely, 0.5.
##
## However, we run evolMonteCarloClustering on this example with a
## temperature ladder that is the output of placeTempers, which
## assumes that the maximum temperature is 5.
KMeansObj  <- KMeansFuncGenerator1(-97531)
samplerObj <-
    with(KMeansObj,
     {
         temperLadder    <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684,
                              0.7900778, 0.6496648, 0.5135825, 0.5000000)
         nLevels         <- length(temperLadder)
         sampDim         <- nrow(yy)
         startingVals    <- sample(c(0, 1),
                                   size    = nLevels * sampDim,
                                   replace = TRUE)
         startingVals    <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
         moveProbsList   <- list(MH             = 0.4,
                                 RC             = 0.3,
                                 'SCSC_TWO_NEW' = 0.3)
         mm              <- floor(nLevels / 2)
         moveNTimesList  <- list(MH             = 1,
                                 RC             = mm,
                                 'SCSC_TWO_NEW' = mm,
                                 RE             = nLevels)
         evolMonteCarloClustering(nIters            = 100,
                                  temperLadder      = temperLadder,
                                  startingVals      = startingVals,
                                  logTarDensFunc    = logTarDensFunc,
                                  moveProbsList     = moveProbsList,
                                  moveNTimesList    = moveNTimesList,
                                  levelsSaveSampFor = seq_len(nLevels),
                                  saveFitness       = TRUE,
                                  verboseLevel      = 1)
     })
print(samplerObj)
print(names(samplerObj))
with(c(samplerObj, KMeansObj),
 {
     print(acceptRatios)
     print(detailedAcceptRatios)
     print(dim(draws))
     fitnessCol <- ncol(draws[ , , 1])     
     sub        <- paste('uniform prior on # of clusters: DU[',
                         priorMinClusters, ', ',
                         priorMaxClusters, ']', sep = '')
     for (ii in rev(seq_along(levelsSaveSampFor))) {
         main <- paste('EMCC (MAP) clustering (temper = ',
                       round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
                       sep = '')
         MAPRow <- which.min(draws[ , fitnessCol, ii])
         clusterPlot(clusterInd        = draws[MAPRow, -fitnessCol, ii],
                     data              = yy,
                     main              = main,
                     sub               = sub,
                     knownClusterMeans = knownClusterMeans)
     }
 })