swNoData
In bayesImageS: Bayesian Methods for Image Segmentation using a Potts Model

Gibbs sampling was originally designed by @Geman1984 for drawing updates from the Gibbs distribution, hence the name. However, single-site Gibbs sampling exhibits poor mixing due to the posterior correlation between the pixel labels. Thus it is very slow to converge when the correlation (controlled by the inverse temperature $\beta$) is high.

The algorithm of @Swendsen1987 addresses this problem by forming clusters of neighbouring pixels, then updating all of the labels within a cluster to the same value. When simulating from the prior, such as a Potts model without an external field, this algorithm is very efficient.

The SW function in the PottsUtils package is implemented in a combination of R and C. The swNoData function in bayesImageS is implemented using RcppArmadillo, which gives it a speed advantage. It is worth noting that the intention of bayesImageS is not to replace PottsUtils. Rather, an efficient Swendsen-Wang algorithm is used as a building block for implementations of ABC [@Grelaud2009], path sampling [@Gelman1998], and the exchange algorithm [@Murray2006]. These other algorithms will be covered in future posts.

There are two things that we want to keep track of in this simulation study: the speed of the algorithm and the distribution of the summary statistic. We will be using system.time(..) to measure both CPU and elapsed (wall clock) time taken for the same number of iterations, for a range of inverse temperatures:

beta <- seq(0,2,by=0.1)
tmMx.PU <- tmMx.bIS <- matrix(nrow=length(beta),ncol=2)
rownames(tmMx.PU) <- rownames(tmMx.bIS) <- beta
colnames(tmMx.PU) <- colnames(tmMx.bIS) <- c("user","elapsed")

We will discard the first 100 iterations as burn-in and keep the remaining 500.

iter <- 600
burn <- 100
samp.PU <- samp.bIS <- matrix(nrow=length(beta),ncol=iter-burn)

The distribution of pixel labels can be summarised by the sufficient statistic of the Potts model:

$$ S(z) = \sum_{i \sim \ell \in \mathscr{N}} \delta(z_i, z_\ell) $$

where $i \sim \ell \in \mathscr{N}$ are all of the pairs of neighbours in the lattice (ie. the cliques) and $\delta(u,v)$ is 1 if $u = v$ and 0 otherwise (the Kronecker delta function). swNoData returns this automatically, but with SW we will need to use the function sufficientStat to calculate the sufficient statistic for the labels.

library(bayesImageS)

mask <- matrix(1,50,50)
neigh <- getNeighbors(mask, c(2,2,0,0))
block <- getBlocks(mask, 2)
edges <- getEdges(mask, c(2,2,0,0))

n <- sum(mask)
k <- 2
bcrit <- log(1 + sqrt(k))
maxSS <- nrow(edges)

for (i in 1:length(beta)) {
  if (requireNamespace("PottsUtils", quietly = TRUE)) {
    tm <- system.time(result <- PottsUtils::SW(iter,n,k,edges,beta=beta[i]))
    tmMx.PU[i,"user"] <- tm["user.self"]
    tmMx.PU[i,"elapsed"] <- tm["elapsed"]
    res <- sufficientStat(result, neigh, block, k)
    samp.PU[i,] <- res$sum[(burn+1):iter]
    print(paste("PottsUtils::SW",beta[i],tm["elapsed"],median(samp.PU[i,])))
   } else {
      print("PottsUtils::SW unavailable on this platform.")
   }


  # bayesImageS
  tm <- system.time(result <- swNoData(beta[i],k,neigh,block,iter))
  tmMx.bIS[i,"user"] <- tm["user.self"]
  tmMx.bIS[i,"elapsed"] <- tm["elapsed"]
  samp.bIS[i,] <- result$sum[(burn+1):iter]
  print(paste("bayesImageS::swNoData",beta[i],tm["elapsed"],median(samp.bIS[i,])))
}

Here is the comparison of elapsed times between the two algorithms (in seconds):

summary(tmMx.PU)
summary(tmMx.bIS)
boxplot(tmMx.PU[,"elapsed"],tmMx.bIS[,"elapsed"],ylab="seconds elapsed",names=c("SW","swNoData"))

On average, swNoData using RcppArmadillo [@Eddelbuettel2013] is seven times faster than SW.

s_z <- c(samp.PU,samp.bIS)
s_x <- rep(beta,times=iter-burn)
s_a <- rep(1:2,each=length(beta)*(iter-burn))
s.frame <- data.frame(s_z,c(s_x,s_x),s_a)
names(s.frame) <- c("stat","beta","alg")
s.frame$alg <- factor(s_a,labels=c("SW","swNoData"))
  if (requireNamespace("lattice", quietly = TRUE)) {
    lattice::xyplot(stat ~ beta | alg, data=s.frame)
  }
plot(c(s_x,s_x),s_z,pch=s_a,xlab=expression(beta),ylab=expression(S(z)))
abline(v=bcrit,col="red")

The overlap between the two distributions is almost complete, although it is a bit tricky to verify that statistically. The relationship between $beta$ and $S(z)$ is nonlinear and heteroskedastic.

rowMeans(samp.bIS) - rowMeans(samp.PU)
apply(samp.PU, 1, sd)
apply(samp.bIS, 1, sd)
s.frame$beta <- factor(c(s_x,s_x))
  if (requireNamespace("PottsUtils", quietly = TRUE)) {
    s.fit <- aov(stat ~ alg + beta, data=s.frame)
    summary(s.fit)
    TukeyHSD(s.fit,which="alg")
  }