evolMonteCarloClustering: evolutionary Monte Carlo clustering algorithm

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/doEMCC.R

Description

Given a possibly multi-modal and multi-dimensional clustering target density function and a temperature ladder this function produces samples from the target using the evolutionary Monte Carlo clustering (EMCC) algorithm.

Below sampDim refers to the dimension of the sample space, temperLadderLen refers to the length of the temperature ladder, and levelsSaveSampForLen refers to the length of the levelsSaveSampFor.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
evolMonteCarloClustering(nIters,              
                         temperLadder,              
                         startingVals,              
                         logTarDensFunc,            
                         MHMergeProb       = 0.5,  
                         moveProbsList     = NULL,  
                         moveNTimesList    = NULL,  
                         levelsSaveSampFor = NULL,  
                         saveFitness       = FALSE, 
                         verboseLevel      = 0,     
                         ...)                       

Arguments

nIters

integer > 0.

temperLadder

double vector with all positive entries, in decreasing order.

startingVals

double matrix of dimension temperLadderLen x sampDim or vector of length sampDim, in which case the same starting values are used for every temperature level.

logTarDensFunc

function of two arguments (draw, ...) that returns the target density evaluated in the log scale.

MHMergeProb

double in (0, 1). See details below for the use of this argument.

moveProbsList

named list of probabilities adding upto 1.

moveNTimesList

named list of integers >= 0.

levelsSaveSampFor

integer vector with positive entries.

saveFitness

logical.

verboseLevel

integer, a value >= 2 produces a lot of output.

...

optional arguments to be passed to logTarDensFunc.

Details

The EMCC algorithm

The evolutionary Monte Carlo clustering (EMCC; Goswami and Liu, 2007) algorithm is composed of the following moves:

MH Metropolis-Hastings or mutation
SCSC_ONE_NEW sub-cluster swap crossover: one new
SCSC_TWO_NEW sub-cluster swap crossover: two new
SCRC sub-cluster reallocation crossover
RE (random) exchange

The current function could be used to run the EMCC algorithm by specifying what moves to employ using the following variables.

moveProbsList and moveNTimesList

The allowed names for components of moveProbsList and moveNTimesList come from the abbreviated names of the moves above. For example, the following specifications are valid:

1
2
3
4
moveProbsList = list(MH           = 0.5,
                     SCSC_TWO_NEW = 0.25,
                     SCRC         = 0.25)
      
1
2
3
4
5
moveNTimesList = list(MH           = 1, 
                      SCSC_TWO_NEW = floor(temperLadderLen / 2),
                      SCRC         = floor(temperLadderLen / 2),
                      RE           = temperLadderLen)           
      
MHMergeProb

In the MH or the mutation step, each of the sampDim-many objects are proposed to either merge with an existing cluster or split to form its own cluster with probability MHMergeProb and (1 - MHMergeProb), respectively (see Goswami and Liu, 2007).

levelsSaveSampFor

By default, samples are saved and returned for temperature level temperLadderLen. The levelsSaveSampFor could be used to save samples from other temperature levels as well (e.g., levelsSaveSampFor = 1:temperLadderLen saves samples from all levels).

saveFitness

The term fitness refers to the function H(x), where the target density of interest is given by:

g(x) \propto \exp[ -H(x) / τ_{min} ]

H(x) is also known as the energy function. By default, the fitness values are not saved, but one can do so by setting saveFitness = TRUE.

Value

This function returns a list with the following components:

draws

array of dimension nIters x sampDim x levelsSaveSampForLen, if saveFitness = FALSE. If saveFitness = TRUE, then the returned array is of dimension nIters x (sampDim + 1) x levelsSaveSampForLen; i.e., each of the levelsSaveSampForLen matrices contain the fitness values in their last column.

acceptRatios

matrix of the acceptance rates for various moves used.

detailedAcceptRatios

list of matrices with detailed summary of the acceptance rates for various moves used.

nIters

the nIters argument.

temperLadder

the temperLadder argument.

startingVals

the startingVals argument.

moveProbsList

the moveProbsList argument.

moveNTimesList

the moveNTimesList argument.

levelsSaveSampFor

the levelsSaveSampFor argument.

time

the time taken by the run.

Note

The effect of leaving the default value NULL for some of the arguments above are as follows:

moveProbsList list(MH = 0.5, RC = 0.25, 'SCSC_TWO_NEW' = 0.25).
moveNTimesList list(MH = 1, RC = mm, 'SCSC_TWO_NEW' = mm, RE = nn),
where mm <- floor(nn / 2) and nn <- temperLadderLen.
levelsSaveSampFor temperLadderLen.

Author(s)

Gopi Goswami goswami@stat.harvard.edu

References

Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
## The following example is a simple stochastic optimization problem,
## the set up is same as that of findMaxTemper and placeTempers. Here
## no "heating up" is necessary, and hence the maximum temprature is
## the coldest one, namely, 0.5.
##
## However, we run evolMonteCarloClustering on this example with a
## temperature ladder that is the output of placeTempers, which
## assumes that the maximum temperature is 5.
KMeansObj  <- KMeansFuncGenerator1(-97531)
samplerObj <-
    with(KMeansObj,
     {
         temperLadder    <- c(5.0000000, 1.5593974, 1.1028349, 0.9220684,
                              0.7900778, 0.6496648, 0.5135825, 0.5000000)
         nLevels         <- length(temperLadder)
         sampDim         <- nrow(yy)
         startingVals    <- sample(c(0, 1),
                                   size    = nLevels * sampDim,
                                   replace = TRUE)
         startingVals    <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
         moveProbsList   <- list(MH             = 0.4,
                                 RC             = 0.3,
                                 'SCSC_TWO_NEW' = 0.3)
         mm              <- floor(nLevels / 2)
         moveNTimesList  <- list(MH             = 1,
                                 RC             = mm,
                                 'SCSC_TWO_NEW' = mm,
                                 RE             = nLevels)
         evolMonteCarloClustering(nIters            = 100,
                                  temperLadder      = temperLadder,
                                  startingVals      = startingVals,
                                  logTarDensFunc    = logTarDensFunc,
                                  moveProbsList     = moveProbsList,
                                  moveNTimesList    = moveNTimesList,
                                  levelsSaveSampFor = seq_len(nLevels),
                                  saveFitness       = TRUE,
                                  verboseLevel      = 1)
     })
print(samplerObj)
print(names(samplerObj))
with(c(samplerObj, KMeansObj),
 {
     print(acceptRatios)
     print(detailedAcceptRatios)
     print(dim(draws))
     fitnessCol <- ncol(draws[ , , 1])     
     sub        <- paste('uniform prior on # of clusters: DU[',
                         priorMinClusters, ', ',
                         priorMaxClusters, ']', sep = '')
     for (ii in rev(seq_along(levelsSaveSampFor))) {
         main <- paste('EMCC (MAP) clustering (temper = ',
                       round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
                       sep = '')
         MAPRow <- which.min(draws[ , fitnessCol, ii])
         clusterPlot(clusterInd        = draws[MAPRow, -fitnessCol, ii],
                     data              = yy,
                     main              = main,
                     sub               = sub,
                     knownClusterMeans = knownClusterMeans)
     }
 })

EMCC documentation built on May 29, 2017, 1:03 p.m.