knitr::opts_chunk$set(fig.dim=c(9, 9), out.width=600, out.height=600)
Before we jump into coding, let's start by loading the package and
the data that we will be using; the lifeexpect
database.
This data set was simulated using official statistics and research on life expectancy
in the US (see ?lifeexpect
for more details). Each row corresponds to the
observed age of an individual at the time of disease.
library(fmcmc) data(lifeexpect) head(lifeexpect) # Taking a look at the data
The data is characterized by the following model:
\begin{equation} y_i \sim \mbox{N}\left(\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i, \sigma^2\right) \end{equation}
So the logposterior can be written as:
\begin{equation} \sum_i \log\phi\left(\frac{y_i - (\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i)}{\sigma}\right) \end{equation}
Where $y_i$ is the age of the i-th individual. Using R, we could write the logposterior as follows:
logpost <- function(p, D) { sum(dnorm( D$age - (p[1] + p[2] * D$smoke + p[3] * D$female), sd = p[4], log = TRUE )) }
In some cases, the user would like to go beyond what MCMC()
does. In those cases,
we can directly access the environment in which the main loop of the MCMC
-call
is being executed, using the function ith_step()
.
With ith_step()
, we can access the environment containing the existing elements
while the MCMC loop occurs. Among these, we have: i
(the step number),
logpost
(a vector storing the trace of the unnormalized logposterior), draws
,
(a matrix storing the kernel's proposed states), etc. The complete list of available
objects is available either in the manual or when printing the function:
# This will show the available objects
ith_step
For example, sometimes accidents happen, and your computing environment could crash (R, your PC, your server, etc.). It could be a good idea to keep track of the current state of the chain. A way to do this is printing out the state of the process every n-th step.
Using the lifeexpect
data, let's rewrite the logpost()
function using
ith_step()
. We will print the latest accepted state every 1,000 steps:
logpost2 <- function(p, D) { # Getting the number of step i <- ith_step("i") # After the first iteration, every 1000 steps: if (i > 1L && !(i %% 1000)) { # The last accepted state. Accepted states are # stored in -ans-. s <- ith_step()$ans[i - 1L,] cat("Step: ", i, " state: c(\n ", paste(s, collapse = ", "), "\n)\n", sep = "") } # Just returning the previous value logpost(p, D) }
Note that the posterior distribution, i.e., accepted states, is stored in the matrix
ans
within the MCMC loop. Let's use the Robust Adaptive Metropolis Kernel to fit this model.
Since we need to estimate the standard error, we can set a lower-bound for the variables.
For the starting point, let's use the vector [70, 0, 0, sd(age)]
(more than a good
guess!):
# Generating kernel kern <- kernel_ram(warmup = 1000, lb = c(-100,-100,-100,.001)) # Running MCMC ans0 <- MCMC( initial = c(70, 0, 0, sd(lifeexpect$age)), fun = logpost2, nsteps = 10000, kernel = kern, seed = 555, D = lifeexpect, progress = FALSE )
The ith_step()
makes MCMC
very easy to tailor. Now what happens when we deal
with multiple chains?
Using the function ith_step()
could be of real help when dealing with multiple
chains in a single run. In such a case, we can use the variable chain_id
that
can be found with ith_step()
. From the previous example:
logpost3 <- function(p, D) { # Getting the number of step i <- ith_step("i") # After the first iteration, every 1000 steps: if (i > 1L && !(i %% 1000)) { # The last accepted state. Accepted states are # stored in -ans-. s <- ith_step()$ans[i - 1L,] chain <- ith_step("chain_id") cat("Step: ", i, " chain: ", chain, " state: c(\n ", paste(s, collapse = ",\n "), "\n)\n", sep = "" ) } # Just returning the previous value logpost(p, D) } # Rerunning using parallel chains ans1 <- MCMC( initial = ans0, fun = logpost3, # The new version of logpost includes chain nsteps = 1000, kernel = kern, # Reusing the kernel thin = 1, nchains = 2L, # Two chains, two different prints multicore = FALSE, seed = 555, progress = FALSE, D = lifeexpect )
Using ith_state()
increases the computational burden of the process. Yet,
since most of the load lies on the objective function itself, the additional
time can be neglected.
Another thing the user may need to do is storing data as the MCMC algorithm runs.
In such cases, you can use the set_userdata()
function, which, as the name suggests,
will store the required data.
For a simple example, suppose we wanted to store the proposed state, we could do it in the following way:
logpost4 <- function(p, D) { # Timestamp set_userdata( p1 = p[1], p2 = p[2], p3 = p[3] ) # Just returning the previous value logpost(p, D) } # Rerunning using parallel chains ans1 <- MCMC( initial = ans0, fun = logpost4, # The new version of logpost includes chain nsteps = 1000, kernel = kern, # Reusing the kernel thin = 10, # We are adding thinning nchains = 2L, # Two chains, two different prints multicore = FALSE, seed = 555, progress = FALSE, D = lifeexpect )
In this case, since nchains == 2
, MCMC
will store a list of length two
with the user data. To retrieve the generated data frame, we can call the function
get_userdata()
. We can also inspect the MCMC_OUTPUT
as follows:
print(MCMC_OUTPUT) str(get_userdata())
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.