library(knitr) opts_chunk$set(out.extra='style="display:block; margin: auto"' #, fig.align="center" , fig.width=4.3, fig.height=3.2, dev.args=list(pointsize=10) ) knit_hooks$set(spar = function(before, options, envir) { if (before){ par( las=1 ) #also y axis labels horizontal par(mar=c(2.0,3.3,0,0)+0.3 ) #margins par(tck=0.02 ) #axe-tick length inside plots par(mgp=c(1.1,0.2,0) ) #positioning of axis title, axis labels, axis } })
This vignette details the sampling of different subsets, i.e. blocks, of the parameter vector by different log-Density functions. The background is described in Wutzler & Carvalhais 2014 (doi: 10.1002/2014jg002650).
if(!exists("isDevelopMode")) library(blockDemac) set.seed(0815) # for reproducable results
In the model two streams of predictions are generated, each using the same two parameters. The variance in the first stream is dominated by the first parameter a
, while the variance in the second stream is dominated by the second parameter b
.
modTwTwoDenEx1 <- function( theta ##<< model parameters a and b , xSparse ##<< numeric vector of Sparse input/output relationship , xRich ##<< numeric vector of rich input/output relationship , thresholdCovar=0 ##<< model structural deficiency ){ list( y1 = as.numeric(theta[1]*xSparse + theta[2]*mean(xRich)/10) ,y2 = as.numeric(theta[1]*xSparse[1] + theta[2]*pmax(0,xRich-thresholdCovar) )) }
The synthetic data is generated by running the model. The second stream,y2
, with covariate xRich
has 100 times as many records as the first stream,y1
, with covariate xSparse
.
xSparse <- c(1,runif(9,min=0.5,max=1.5)) # only 10 observations xRich <- runif(1000,min=.7,max=1) # 1000 observations thetaTrue <- c( a=1, b=2 ) theta0 <- thetaTrue covarTheta0 <- diag(0.2*thetaTrue) obsTrue <- modTwTwoDenEx1( thetaTrue, xSparse=xSparse,xRich=xRich , thresholdCovar=0.3) sdObsTrue <- sdObs <- list( y1=mean(obsTrue$y1)*0.06 ,y2=mean(obsTrue$y2)*0.02 ) obs <- list( y1 = obsTrue$y1 + rnorm(length(obsTrue$y1),sd=sdObsTrue$y1) ,y2 = obsTrue$y2 + rnorm(length(obsTrue$y2),sd=sdObsTrue$y2) )
For simplicity we assume no prior and Gaussian distribution of model-data residuals. We specify one probability density function for the likelihood of the sparse data stream and another for function for the likelihood of the rich data stream.
denSparse <- function( theta, intermediates, logDensityComponents ,xSparse, xRich, obs, sdObs, ... ){ pred <- modTwTwoDenEx1(theta, xSparse=xSparse, xRich=xRich, ...) misfit <- obs$y1 - pred$y1 structure( -1/2 * sum((misfit/sdObs$y1)^2), names='y1' ) } logDensityComponentsSparse0 <- denSparse( theta0, list(), numeric(0), xSparse, xRich, obs, sdObsTrue ) denRich <- function( theta, intermediates, logDensityComponents ,xSparse, xRich, obs, sdObs, ... ){ pred <- modTwTwoDenEx1(theta, xSparse=xSparse, xRich=xRich, ...) misfit <- obs$y2 - pred$y2 structure( -1/2 * sum((misfit/sdObs$y2)^2), names='y2' ) } logDensityComponentsRich0 <- denRich( theta0, list(), numeric(0), xSparse, xRich, obs, sdObsTrue )
Both densities rely on the same forward model run pred <- modTwTwoDenEx1(theta...)
. If such a run is expensive, one can make both densities depend on the same intermediate and save some of the runs, as described in the vignette for Sampling observation uncertainty.
Since variation in the sparse data stream is dominated by parameter a
we let this parameter be updated by likelihood of the sparse data stream. Similarly, block for parameter b
is updated by the likelihood of the rich data stream.
argsFLogDensity = list(xSparse=xSparse, xRich=xRich, obs=obs, sdObs=sdObs) sparseBlockSpec <- blockSpec("a",names(theta0), new("MetropolisBlockUpdater", fLogDensity = denSparse, argsFLogDensity = argsFLogDensity, logDensityComponentNames = names(logDensityComponentsSparse0) ) ) richBlockSpec <- blockSpec("b",names(theta0), new("MetropolisBlockUpdater", fLogDensity = denRich, argsFLogDensity = argsFLogDensity, logDensityComponentNames = names(logDensityComponentsRich0) ) ) blockSpecs = list( sparse=sparseBlockSpec, rich=richBlockSpec )
Invoking the with several blocks, is the same as with using one single block, as described in the vignette on the simple Model.
sampler <- newPopulationSampler( blockSpecs, theta0, covarTheta0, nPopulation=1L ) sampler <- setupAndSample(sampler, nSample=60L) sampleLogs <- getSampleLogs(sampler) plot( asMcmc(sampleLogs), smooth=FALSE ) stacked <- stackChainsInPopulation(subsetTail(getSampleLogs(sampler),0.8)) sample1 <- t(getParametersForPopulation(stacked, 1L)[,,1L]) summary(sample1)
The two data streams are conflicting giving the somewhat biased model. Therefore, the two pdfs dSparse
and dRich
peak at different locations in parameter space. The color code of the following figure ranks the samples by different criteria.
# append the within block sums across temperated logDensityComponents ss <- cbind( sample1, t(getLogDensitiesForPopulation(stacked,1L)[,,1L])) oldPar <- par(mfrow=c(1,5), par(mar=c(2.0,3.3,2.0,0)+0.3 ) ) colh <- heat.colors(nrow(ss)) plot( b ~ a, data=as.data.frame(ss[ rev(order(ss[,"sparse"])),]) , col=colh, main="rank sparse" ) plot( b ~ a, data=as.data.frame(ss[ rev(order(ss[,"rich"])),]) , col=colh, main="rank rich" ) plot( b ~ a, data=as.data.frame(ss[ rev(order(rowSums(ss[,c("sparse","rich")]))),]) , col=colh, main="rank sum" ) plot( b ~ a, data=as.data.frame(ss[ rev(order(rowSums(apply(ss[,c("sparse","rich")],2,rank)) )),]) , col=colh, main="sum ranks" ) ranks <- computeSampleRanksForPopulation(stacked, 1L)[,1L] plot( b ~ a, data=as.data.frame(ss[ranks,]) , col=colh, main="min ranks" )
The right plot shows that the sparse data stream effectively constrains parameter a
. The second plot shows that the rich data stream constrains a combination of both parameters with best parameters having very low values of a
.
The third plot shows that the influcence of the sparse data stream is negligible in the sum of the two densities.
Only with the last two plots, the color code roughly matches the density of the sample.
Instead of ranking the sum of the densities, the sum or the minimum of both density ranks is a good criterion for determining the best parameters based on given set of densities. The last criterion is implemented in function computeSampleRanksForPopulation
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.