twDEMCBlockInt: twDEMCBlockInt
In twDEMC: parallel DEMC

Description Usage Arguments Details Value Author(s) See Also Examples

Parallelized differential Evolution Markov Chain with blocked parameter update

twDEMCBlockInt(pops = list(list(parms = list(), nGen = 12, TProp = NULL, 
    X = NULL, logDenCompX = NULL, spaceInd = 1, upperParBounds = numeric(0), 
    lowerParBounds = numeric(0),
 splits = numeric(0))), dInfos = list(list(fLogDen = NULL, 
    argsFLogDen = list(), compPosDen, intResComp = vector("integer", 
        0), maxLogDen = c(), fDiscrProp = NULL, argsFDiscrProp = list())), 
    blocks = list(list(compPos = dInfos[[1]]$compPosDen, fUpdateBlock = NULL, 
        argsFUpdate = list(), intermediateId = character(0), 
        dInfoPos = 1)), stateCurrent = list(, xC = c(), logDenCompC = c(), 
        parUpdateDenC = c()), TSpec = matrix(1, ncol = 2, nrow = 1, 
        dimnames = list(NULL, c("T0", "TEnd"))), m0 = numeric(0), 
    nGen = integer(0), spacePop = 1:nPop, controlTwDEMC = list(), 
    debugSequential = FALSE, remoteDumpfileBasename = NULL, doRecordProposals = FALSE, 
    progressOutput = ".")

`pops`	list of population infos for each population, each info is a list with components parms list of matrices (nState x nParm x nChain) initial states for each population see details and `initZtwDEMCNormal`. nGen scalar integer: number of generations, if given overwrites `pops[[i]]$nGen` TProp numeric vector (nResComp) temperature proportions of result components. It can also be given as character vector with names of result components, however, be aware that htis fails if several logDen may return components of the same name X numeric matrix (nParm x nChainPop) initial state logDenCompX numeric matrix (nComp x nChain): logDen components of initial state X, see details spaceInd the space replicate that this population belongs to upperParBounds named numeric vectors: giving upper parameter bounds: lowerBound < par <= upperBound. For exploring subspaces of the limiting distribution, see details lowerParBounds similar to upperParBounds: sample > bound splits named numeric vector of splitting points, used to remerge divided subspaces
`dInfos`	named list of used density functions. Each entry is a list with components fLogDen `function(theta, ...)` calculates a vector of logDensities corresponding to different data streams of parameter vector theta , or `function(theta, logDenCompX, metropolisStepTemp, ...)` argsFLogDen further arguments passed to fLogDen intResComp integer or character vector: indices or names of results components of fLogDen that are used for internal Metropolis decisions maxLogDen integer vector (nRespComp): maximum logDensity (=usually -1/2* number of observations) for each result component, used in control of overfitting fDiscrProp function applied to proposal, e.g. to round proposals to to discrete possible values function(theta,...) argsFDiscrProp further arguments to fDiscrProp list describing the logDensities, see `twDEMCBlockInt`
`blocks`	list of parameter blocks, each block is a list with entries compPos names or index of the parameter components to be updated fUpdateBlock function to update the parameters. It must return a list with first named components `accepted`, `xC`, and optionally `intermediate` as described in `updateBlockTwDEMC`. If not specified (default), then `updateBlockTwDEMC` is used and `dInfoPos` must be specified argsFUpdate further arguments passed to fUpdate intermediateId string: identifier in list of intermediate results that are shared between block udpate functions. See details. dInfoPos name or position to `fLogDenInfo` if using Metropolis update. Several blocks may share the same density but update different parameters
`stateCurrent`
`TSpec`	numeric matrix (nResComp x 2): specifying Initial and End Temperature of each fLogDen result component. If only one row is specified, the Temperature is taken for all result components
`m0`	scalar integer: number of samples in initial population, if length==0 will be calculated from number of chains and number of parameters
`nGen`	scalar integer: number of generations, if given overwrites `pops[[i]]$nGen`
`spacePop`	the space replicate that each population belongs to. By default assume only one population per space, overridden by entry in pops
`controlTwDEMC`	control parameters influencing the update and the convergens of the chains (see details)
`debugSequential`	if TRUE apply is used instead of sfApply, for easier debugging
`remoteDumpfileBasename`	the basename of a dumpfile that is created on error on remote process (see example section)
`doRecordProposals`	if TRUE then an array of each proposal together with the results of fLogDen are recorded and returned in component Y
`progressOutput`	output after each thinning interval

This is the central method for applying a Differential Evolution Markov Chain given a function of an unnormalized probability density. It is invoked usually by (twDEMCBlock.array or twDEMCBlock.twDEMCPops)

recognized control parameters in controlTwDEMC:

thin: thinning interval, Default: 4
F: related to multiplicative error (F2=F/sqrt(2*Npar), see eps.mult, Default: 2.38
pSnooker: probability of a snooker update (others parallel updates), Default 0.1
pGamma1: probability of jumping to state of another chain (different modes), Default 0.1
epsMult: >0 gives d-dimensional variation around gamma. It adds scaled uncorrelated noise to the proposal. Default: 0.2 , Its advantage over eps.add is that its effect scales with the differences of vectors in the population whereas eps.add does not. if the variance of a dimensions is close to 0, eps.mult gives smaller changes. , A uniformly distributed error, i.e. F2*runif(1+-epsMult*prop) multiplied to difference vector from parallel update
epsAdd: >0 is needed to ensure that all positions in the space can be reached. For targets without gaps, it can set small or even to 0. Default 0 , sd of normally distributed error added to proposal by parallel or snooker update.
pAcceptWindowWidth: number of thinning intervals back over which the acceptance rate is calculated Default 8
probUpDir: probability of direction between two states of increasing Density, Default 0.5 , Increasing this during burin may accelerate convergence
initialAcceptanceRate: numeric matrix (nBlock x nChains) initially assumed acceptance rate. Default 0.25 , Used to calculate the number of generations backwards to sample from
DRgamma: factor for reducing step length [0..1) in delayed rejection step, 0 means no DR step, Default 0
minPCompAcceptTempDecr: if acceptance rate drops below minPCompAcceptTempDecr+0.02 times this level, employ delayed rejection (DR), Default 0.15
pIndStep: assume state to be independent, after on average about those number of accepted steps, Default 1.5
nPastGen: factor for determining the number of recent past states to sample during burnin. Default 10 It is multiplied by the number of parameters. Past generations are calculated by deviding by the number of chains per population
returnIntermediates: set to FALSE if intermediate result is large and transferring it between slaves slows down calculation
useConditionalProposal: if set to TRUE, poposal steps are generated conditional on the state of the other blocks (experimental)

Initial state: X: If initial state X is not specified, the last column (generation) of Z is utilized. If in addition to X, logDenX is specified as a numeric vector, the fLogDen will not be avaluated for the initial state of the chains. All the results of fLogDen for the initial state must be finite.

Acceptance rate: The acceptance rate is tracked for each chain across ctrl$pAcceptWindowWidth thinning intervals. , If acceptance rates drops to low values, this might be because of bad localization ,i.e the requirement to use states from more distant past. In order to improve localization, less parameters or more chains per population are required.

Overfitting control

Cost, i.e. -2*logDensity, below the number of observations indicates an overfitting to the data stream.

By supplying a maximum logDensity for one result component with dInfo$maxLogDen = -1/2*nObs one can activate the control for this overfitting. This works only, however, if there are enough observations. When a result component is based on less than about 30 observartions, one should give a maximum Densitiy of 0.

With overfitting control, there is an upper bound bound for the logDensity of one result compoent- Calculated higher logDensities indicate overfitting and are decreases to the maxLogDen value.

subspace localized sampling

The covariance matix for new proposals might look very different in different locations of the parameter space. For example, overall CO2 efflux might be composed of two parts microbial respiration and inorganic CO2 production. that are relevant in different parts of the parameter space (governed by another parameter). The mirobial growth rate will be very sensitive in the respiration part and is allowed to to only small jumps, while it might vary large in the inorganic part. Hence, it might be beneficial to use different jumping proposal and different population samples in different subspaces.

In order to support such supspace-localized sampling XX

When using parallel execution on multiple cores, the transfer of very big intermediates can cause performance issues. Use argument controlTwDEMC$returnIntermediate=FALSE to ensure that intemediates are stored only within process boundaries. When executing the next interval or the next round of updates of all blocks, the intermediate will then be the empty list initially.

intermediate results

Different density functions can share intermediate results, so that these do not need to be recomputed. For example two densities can be based on the same model predictions for given parameters, but compare different subsets of the predictions against different observations. If for a given parameter vector, the model output has been computed in one density, it does not need to be recomputed in the second density function.

This is an advanced option. Care must be taken that the intermediate is really the same between densities. And care must be taken that intermediate is recalculated if the parameters change on which the intermediate is based on.

Description of both block update in argument blocks (or argument dInfos if default Metropolis is used) should specify the same identifier string in argument dInfo$intermediateId. The densities or block update functions should return the model output in attribute "intermediate" with the result vector of logDensity components. twDEMC ensures that this result is provided with argument intermediate in another call to the density or block update function. With a new parameter proposal in Metropolis update, an empty list will be given with this argument.

For an example see test case ofMultiIntermediate in file unitTests/runittwDEMC.R. Using densities denSparsePrior and denRichPrior.

a list with entries of populations, each entry is a list

`thin`	thinning interval that has been used
`dInfos`	list of information on densities (argument `dInfos`)
`blocks`	list of information on blocks (argument `blocks`)
`YPos`	list of column positions in pops[i]$Y, a list with entries accepted integer vector of positions of acceptance indication of block at given index resLogDen0 integer scalar: postion before first column of results of fLogDen
`pops`	info on each population. A list with entries: upperParBounds upper parameter bounds for sampling lowerParBounds lower parameter bounds for sampling splits named numeric vector: splitting history spaceInd the space replicate that the population belongs to parms numeric array (steps x parms x chains): collected states, including the initial states temp numeric array (nSample+1, nResComp): temperature, i.e. cost reduction factor in each step for each datastream pAccept acceptance rate of chains (nStep x nChainPop) resLogDen numeric array (steps x resComps x chains): results components of fLogDen of blocks logDen numberic array (steps x iDen x chains): results summed over blocks Y numeric matrix (steps x nParam+nBlock+nResComp x chains): parms, isAccepted for each block, and all fLogDen result components for each proposal

Thomas Wutzler

calcDEMCTemp logDenGaussian

    # see unit test 
    # twUtestF("twDEMCPops","distinctLogDen")
data(twLinreg1); attach( twLinreg1 ) 

# tracing error in remote session:
# Provide argument \code{remoteDumpfileBasename="dumpFile"}
# Then a dumpfile is created on remote error by \code{\link{sfRemoteWrapper}}.
# In order to trace the density function, the following can be done
if( FALSE ){
	.remoteDumpfileBasename="dumpfile"
	.remoteDumpfile <- paste(.remoteDumpfileBasename,".rda",sep="")
	load(.remoteDumpfile)
	debugger(get(.remoteDumpfileBasename))
	# choose last step (18)
	#require(debug)
	fDen <- remoteFunArgs$argsUpdateBlocksTwDEMC$argsFUpdateBlocks[[1]]$fLogDen
	mtrace(fDen); remoteFunArgs$argsUpdateBlocksTwDEMC$argsFUpdateBlocks[[1]]$fLogDen <- fDen
	do.call(remoteFun, c(remoteFunArgs, list(...)))	
	# go()
	# qqq()
}