Description Usage Arguments Details Value Note Author(s) References See Also Examples
The evolutionary Monte Carlo clustering (EMCC) algorithm needs a temperature ladder. This function finds the maximum temperature for constructing the ladder.
Below sampDim refers to the dimension of the sample space,
temperLadderLen refers to the length of the temperature ladder,
and levelsSaveSampForLen refers to the length of
levelsSaveSampFor. Note, this function calls
evolMonteCarloClustering, so some of the arguments below
have the same name and meaning as the corresponding ones for
evolMonteCarloClustering. See details below for
explanation on the arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | findMaxTemper(nIters,
statsFuncList,
startingVals,
logTarDensFunc,
temperLadder = NULL,
temperLimits = NULL,
ladderLen = 10,
scheme = 'exponential',
schemeParam = 0.5,
cutoffDStats = 1.96,
cutoffESS = 50,
guideMe = TRUE,
levelsSaveSampFor = NULL,
saveFitness = FALSE,
doFullAnal = TRUE,
verboseLevel = 0,
...)
|
nIters |
|
statsFuncList |
|
startingVals |
|
logTarDensFunc |
|
temperLadder |
|
temperLimits |
|
ladderLen |
|
scheme |
|
schemeParam |
|
cutoffDStats |
|
cutoffESS |
|
guideMe |
|
levelsSaveSampFor |
|
saveFitness |
|
doFullAnal |
|
verboseLevel |
|
... |
optional arguments to be passed to |
This function is based on the method to find the temperature range introduced in section 4.1 of Goswami and Liu (2007).
statsFuncListThe user specifies this list of functions, each of which is known to be sensitive to the presence of modes. For example, if both dimension 1 and 3 (i.e., objects 1 and 3) are sensitive to presence of modes, then one could use:
1 2 | coord1 <- function (xx) { xx[1] }
|
1 2 | coord3 <- function (xx) { xx[3] }
|
1 2 | statsFuncList <- list(coord1, coord3)
|
temperLadderThis is the temperature ladder needed for
the first stage preliminary run. One can either specify a
temperature ladder via temperLadder or specify
temperLimits, ladderLen, scheme and
schemeParam. For details on the later set of parameters,
see below. Note, temperLadder overrides
temperLimits, ladderLen, scheme and
schemeParam.
temperLimitstemperLimits = c(lowerLimit,
upperLimit) is a two-tuple of positive numbers, where the
lowerLimit is usually 1 and upperLimit is a number
in [100, 1000]. If stochastic optimization (via sampling) is the
goal, then lowerLimit is taken to be in [0, 1].
ladderLen, scheme and schemeParamThese
three parameters are required (along with temperLimits) if
temperLadder is not provided. We recommend taking
ladderLen in [15, 30]. The allowed choices for
scheme and schemeParam are:
scheme | schemeParam |
| ======== | ============= |
| linear | NA |
| log | NA |
| geometric | NA |
| mult-power | NA |
| add-power | >= 0 |
| reciprocal | NA |
| exponential | >= 0 |
| tangent | >= 0 |
We recommended using scheme = 'exponential' and
schemeParam in [0.3, 0.5].
cutoffDStatsThis cutoff comes from Normal_1(0,
1), the standard normal distribution (Goswami and Liu, 2007); the
default value 1.96 is a conservative cutoff. Note if you have more
than one statistic in statsFuncList, which is usually the
case, using this cutoff may result in different suggested maximum
temperatures (as can be seen by calling the print function
on the result of findMaxTemper). A conservative
recommendation is that you choose the maximum of the suggested
temperatures as the final maximum temperature for use in
placeTempers and later in parallelTempering or
evolMonteCarlo.
cutoffESSa cutoff for the effective sample size (ESS) of the underlying Markov chain ergodic estimator and the importance sampling estimators.
guideMeIf guideMe = TRUE, then the function
suggests different modifications to alter the setting towards a
re-run, in case there are problems with the underlying MCMC run.
doFullAnalIf doFullAnal = TRUE, then the
search for the maximum temperature is conducted among all
the levels of the temperLadder. In case this switch is
turned off, the search for maximum temperature is done in a greedy
(and faster) manner, namely, search is stopped as soon as all the
statistic(s) in the statsFuncList find some maximum
temperature(s). Note, the greedy search may result in much higher
maximum temperature (and hence sub-optimal) than needed, so it is
not recommended.
levelsSaveSampForThis is passed to
evolMonteCarlo for the underlying MCMC run.
This function returns a list with the following components:
temperLadder |
the temperature ladder used for the underlying MCMC run. |
DStats |
the D-statistic (Goswami and Liu, 2007) values used to find the maximum temperature. |
cutoffDStats |
the |
nIters |
the post burn-in |
levelsSaveSampFor |
the |
draws |
|
startingVals |
the |
intermediate statistics |
a bunch of intermediate statistics used
in the computation of |
time |
the time taken by the run. |
The effect of leaving the default value NULL for some of the
arguments above are as follows:
temperLadder
| valid temperLimits, ladderLen, scheme and
schemeParam
|
are provided, which are used to construct the temperLadder.
|
|
temperLimits
| a valid temperLadder is provided.
|
levelsSaveSampFor
| temperLadderLen.
|
Gopi Goswami goswami@stat.harvard.edu
Gopi Goswami and Jun S. Liu (2007). On learning strategies for evolutionary Monte Carlo. Statistics and Computing 17:1:23-38.
Gopi Goswami, Jun S. Liu and Wing H. Wong (2007). Evolutionary Monte Carlo Methods for Clustering. Journal of Computational and Graphical Statistics, 16:4:855-876.
placeTempers, evolMonteCarloClustering
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ## The following example is a simple stochastic optimization problem,
## and thus it does not require any "heating up", and hence the
## maximum temperature turns out to be the coldest one, i.e, 0.5.
adjMatSum <-
function (xx)
{
xx <- as.integer(xx)
adjMat <- outer(xx, xx, function (id1, id2) { id1 == id2 })
sum(adjMat)
}
modeSensitive1 <-
function (xx)
{
with(partitionRep(xx),
{
rr <- 1 + seq_along(clusterLabels)
freq <- sapply(clusters, length)
oo <- order(freq, decreasing = TRUE)
sum(sapply(clusters[oo], sum) * log(rr))
})
}
entropy <-
function (xx)
{
yy <- table(as.vector(xx, mode = "numeric"))
zz <- yy / length(xx)
-sum(zz * log(zz))
}
maxProp <-
function (xx)
{
yy <- table(as.vector(xx, mode = "numeric"))
oo <- order(yy, decreasing = TRUE)
yy[oo][1] / length(xx)
}
statsFuncList <- list(adjMatSum, modeSensitive1, entropy, maxProp)
KMeansObj <- KMeansFuncGenerator1(-97531)
maxTemperObj <-
with(KMeansObj,
{
temperLadder <- c(20, 10, 5, 1, 0.5)
nLevels <- length(temperLadder)
sampDim <- nrow(yy)
startingVals <- sample(c(0, 1),
size = nLevels * sampDim,
replace = TRUE)
startingVals <- matrix(startingVals, nrow = nLevels, ncol = sampDim)
findMaxTemper(nIters = 50,
statsFuncList = statsFuncList,
temperLadder = temperLadder,
startingVals = startingVals,
logTarDensFunc = logTarDensFunc,
levelsSaveSampFor = seq_len(nLevels),
doFullAnal = TRUE,
saveFitness = TRUE,
verboseLevel = 1)
})
print(maxTemperObj)
print(names(maxTemperObj))
with(c(maxTemperObj, KMeansObj),
{
fitnessCol <- ncol(draws[ , , 1])
sub <- paste('uniform prior on # of clusters: DU[',
priorMinClusters, ', ',
priorMaxClusters, ']', sep = '')
for (ii in rev(seq_along(levelsSaveSampFor))) {
main <- paste('EMCC (MAP) clustering (temper = ',
round(temperLadder[levelsSaveSampFor[ii]], 3), ')',
sep = '')
MAPRow <- which.min(draws[ , fitnessCol, ii])
clusterPlot(clusterInd = draws[MAPRow, -fitnessCol, ii],
data = yy,
main = main,
sub = sub,
knownClusterMeans = knownClusterMeans)
}
})
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.