Advanced features

knitr::opts_chunk$set(fig.dim=c(9, 9), out.width=600, out.height=600)

Life expectancy in the US

Before we jump into coding, let's start by loading the package and the data that we will be using; the lifeexpect database. This data set was simulated using official statistics and research on life expectancy in the US (see ?lifeexpect for more details). Each row corresponds to the observed age of an individual at the time of disease.

library(fmcmc)
data(lifeexpect)
head(lifeexpect) # Taking a look at the data

The data is characterized by the following model:

\begin{equation} y_i \sim \mbox{N}\left(\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i, \sigma^2\right) \end{equation}

So the logposterior can be written as:

\begin{equation} \sum_i \log\phi\left(\frac{y_i - (\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i)}{\sigma}\right) \end{equation}

Where $y_i$ is the age of the i-th individual. Using R, we could write the logposterior as follows:

logpost <- function(p, D) {
  sum(dnorm(
    D$age - (p[1] + p[2] * D$smoke + p[3] * D$female),
    sd = p[4],
    log = TRUE
  ))
}

Operating within the loop

In some cases, the user would like to go beyond what MCMC() does. In those cases, we can directly access the environment in which the main loop of the MCMC-call is being executed, using the function ith_step().

With ith_step(), we can access the environment containing the existing elements while the MCMC loop occurs. Among these, we have: i (the step number), logpost (a vector storing the trace of the unnormalized logposterior), draws, (a matrix storing the kernel's proposed states), etc. The complete list of available objects is available either in the manual or when printing the function:

# This will show the available objects
ith_step

For example, sometimes accidents happen, and your computing environment could crash (R, your PC, your server, etc.). It could be a good idea to keep track of the current state of the chain. A way to do this is printing out the state of the process every n-th step.

Using the lifeexpect data, let's rewrite the logpost() function using ith_step(). We will print the latest accepted state every 1,000 steps:

logpost2 <- function(p, D) {

  # Getting the number of step
  i <- ith_step("i")

  # After the first iteration, every 1000 steps:
  if (i > 1L && !(i %% 1000)) {

    # The last accepted state. Accepted states are
    # stored in -ans-.
    s <- ith_step()$ans[i - 1L,]

    cat("Step: ", i, " state: c(\n  ", paste(s, collapse = ", "), "\n)\n", sep = "")

  }

  # Just returning the previous value
  logpost(p, D)

}

Note that the posterior distribution, i.e., accepted states, is stored in the matrix ans within the MCMC loop. Let's use the Robust Adaptive Metropolis Kernel to fit this model. Since we need to estimate the standard error, we can set a lower-bound for the variables. For the starting point, let's use the vector [70, 0, 0, sd(age)] (more than a good guess!):

# Generating kernel
kern <- kernel_ram(warmup = 1000, lb = c(-100,-100,-100,.001))

# Running MCMC
ans0 <- MCMC(
  initial  = c(70, 0, 0, sd(lifeexpect$age)),
  fun      = logpost2, 
  nsteps   = 10000,    
  kernel   = kern, 
  seed     = 555,
  D        = lifeexpect,
  progress = FALSE
  )

The ith_step() makes MCMC very easy to tailor. Now what happens when we deal with multiple chains?

Using ith_step() with multiple chains

Using the function ith_step() could be of real help when dealing with multiple chains in a single run. In such a case, we can use the variable chain_id that can be found with ith_step(). From the previous example:

logpost3 <- function(p, D) {

  # Getting the number of step
  i <- ith_step("i")

  # After the first iteration, every 1000 steps:
  if (i > 1L && !(i %% 1000)) {

    # The last accepted state. Accepted states are
    # stored in -ans-.
    s <- ith_step()$ans[i - 1L,]

    chain <- ith_step("chain_id")

    cat("Step: ", i, " chain: ", chain, " state: c(\n  ",
        paste(s, collapse = ",\n  "), "\n)\n", sep = ""
        )

  }

  # Just returning the previous value
  logpost(p, D)

}

# Rerunning using parallel chains
ans1 <- MCMC(
  initial   = ans0,
  fun       = logpost3, # The new version of logpost includes chain
  nsteps    = 1000,    
  kernel    = kern,     # Reusing the kernel
  thin      = 1,       
  nchains   = 2L,       # Two chains, two different prints     
  multicore = FALSE,
  seed      = 555,
  progress  = FALSE,
  D         = lifeexpect
  )

Using ith_state() increases the computational burden of the process. Yet, since most of the load lies on the objective function itself, the additional time can be neglected.

Saving information as it happens

Another thing the user may need to do is storing data as the MCMC algorithm runs. In such cases, you can use the set_userdata() function, which, as the name suggests, will store the required data.

For a simple example, suppose we wanted to store the proposed state, we could do it in the following way:

logpost4 <- function(p, D) {

  # Timestamp
  set_userdata(
    p1 = p[1],
    p2 = p[2],
    p3 = p[3]
  )

  # Just returning the previous value
  logpost(p, D)

}

# Rerunning using parallel chains
ans1 <- MCMC(
  initial   = ans0,
  fun       = logpost4, # The new version of logpost includes chain
  nsteps    = 1000,    
  kernel    = kern,     # Reusing the kernel
  thin      = 10,       # We are adding thinning
  nchains   = 2L,       # Two chains, two different prints     
  multicore = FALSE,
  seed      = 555,
  progress  = FALSE,
  D         = lifeexpect
  )

In this case, since nchains == 2, MCMC will store a list of length two with the user data. To retrieve the generated data frame, we can call the function get_userdata(). We can also inspect the MCMC_OUTPUT as follows:

print(MCMC_OUTPUT)
str(get_userdata())


Try the fmcmc package in your browser

Any scripts or data that you put into this service are public.

fmcmc documentation built on Aug. 30, 2023, 1:09 a.m.