dc.parfit: Parallel model fitting with data cloning
In datacloning/dclone: Data Cloning and MCMC Tools for Maximum Likelihood Methods

dc.parfit

R Documentation

Parallel model fitting with data cloning

Description

Iterative model fitting on parallel workers with different numbers of clones.

Usage

dc.parfit(cl, data, params, model, inits, n.clones,
    multiply=NULL, unchanged=NULL,
    update = NULL, updatefun = NULL, initsfun = NULL,
    flavour = c("jags", "bugs", "stan"), n.chains = 3,
    partype=c("balancing", "parchains", "both"),
    return.all=FALSE, check.nclones=TRUE, ...)

Arguments

`cl`	A cluster object created by `makeCluster`, or an integer, see `parDosa` and `evalParallelArgument`.
`data`	A named list (or environment) containing the data.
`params`	Character vector of parameters to be sampled. It can be a list of 2 vectors, 1st element is used as parameters to monitor, the 2nd is used as parameters to use in calculating the data cloning diagnostics. (`partype = "both"` currently cannot handle `params` as list.)
`model`	Character string (name of the model file), a function containing the model, or a or `custommodel` object (see Examples).
`inits`	Optional specification of initial values in the form of a list or a function (see Initialization at `jags.model`). If missing, will be treated as `NULL` and initial values will be generated automatically. If this is a function, it must be self containing, i.e. not having references to R objects outside of the function, or the objects should be exported with `clusterExport` before calling `dc.parfit`.
`n.clones`	An integer vector containing the numbers of clones to use iteratively.
`multiply`	Numeric or character index for list element(s) in the `data` argument to be multiplied by the number of clones instead of repetitions.
`unchanged`	Numeric or character index for list element(s) in the `data` argument to be left unchanged.
`update`	Numeric or character index for list element(s) in the `data` argument that has to be updated by `updatefun` in each iterations. This usually is for making priors more informative, and enhancing convergence. This argument is ignored if size balancing is used (default), and not ignored when multiple parallel chains are used.
`updatefun`	A function to use for updating `data[[update]]`. It should take an 'mcmc.list' object as 1st argument, 2nd argument can be the number of clones. This argument is ignored if size balancing is used (default), and not ignored when multiple parallel chains are used.
`initsfun`	A function to use for generating initial values, `inits` are updated by the object returned by this function from the second iteration. If initial values are not dependent on the previous iteration, this should be `NULL`, otherwise, it should take an 'mcmc.list' object as 1st argument, 2nd argument can be the number of clones. This feature is useful if latent nodes are provided in `inits` so it also requires to be cloned for subsequent iterations. The 1st argument of the `initsfun` function is ignored if `partype != "parchains"` but the function must have a first argument regardless, see Examples.
`flavour`	If `"jags"`, the function `jags.fit` is called. If `"bugs"`, the function `bugs.fit` is called (available with `partype = "balancing"` only). If `"stan"`, the function `stan.fit` is called. See Details.
`partype`	Type of parallel workload distribution, see Details.
`n.chains`	Number of chains to generate.
`return.all`	Logical. If `TRUE`, all the MCMC list objects corresponding to the sequence `n.clones` are returned for further inspection (this only works with `partype = "parchains"`). Otherwise only the MCMC list corresponding to highest number of clones is returned with summary statistics for the rest.
`check.nclones`	Logical, whether to check and ensure that values of `n.clones` are unique and increasing. `check.nclones = FALSE` means that `n.clones` is used as is, thus it is possible to supply repeated values but still use the update functionality.
`...`	Other values supplied to `jags.fit`, or `bugs.fit`, depending on the `flavour` argument.

Details

The dc.parfit is a parallel computing version of dc.fit. After parallel computations, temporary objects passed to workers and the dclone package is cleaned up. It is not guaranteed that objects already on the workers and independently loaded packages are not affected. Best to start new instances beforehand.

partype="balancing" distributes each model corresponding to values in n.clones as jobs to workers according to size balancing (see parDosa). partype="parchains" makes repeated calls to jags.parfit for each value in n.clones. partype="both" also calls jags.parfit but each chain of each cloned model is distributed as separate job to the workers.

The vector n.clones is used to determine size balancing. If load balancing is also desired besides of size balancing (e.g. due to unequal performance of the workers, the option "dclone.LB" should be set to TRUE (by using options("dclone.LB" = TRUE)). By default, the "dclone.LB" option is FALSE for reproducibility reasons.

Some arguments from dc.fit are not available in parallel version (update, updatefun, initsfun) when size balancing is used (partype is "balancing" or "both"). These arguments are evaluated only when partype="parchains".

Size balancing is recommended if n.clones is a relatively long vector, while parallel chains might be more efficient when n.clones has few elements. For efficiency reasons, a combination of the two (partype="both") is preferred if cluster size allows it.

Additionally loaded JAGS modules (e.g. "glm") need to be loaded to the workers.

Value

An object inheriting from the class 'mcmc.list'.

Author(s)

Peter Solymos

References

Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.

Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.

Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf

Examples

## Not run: 
set.seed(1234)
n <- 20
x <- runif(n, -1, 1)
X <- model.matrix(~x)
beta <- c(2, -1)
mu <- crossprod(t(X), beta)
Y <- rpois(n, exp(mu))
glm.model <- function() {
    for (i in 1:n) {
        Y[i] ~ dpois(lambda[i])
        log(lambda[i]) <- inprod(X[i,], beta[1,])
    }
    for (j in 1:np) {
        beta[1,j] ~ dnorm(0, 0.001)
    }
}
dat <- list(Y=Y, X=X, n=n, np=ncol(X))
k <- 1:3
## sequential version
dcm <- dc.fit(dat, "beta", glm.model, n.clones=k, multiply="n",
    unchanged="np")
## parallel version
cl <- makePSOCKcluster(3)
pdcm1 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="balancing")
pdcm2 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="parchains")
pdcm3 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="both")
summary(dcm)
summary(pdcm1)
summary(pdcm2)
summary(pdcm3)
stopCluster(cl)
## multicore type forking
if (.Platform$OS.type != "windows") {
mcdcm1 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="balancing")
mcdcm2 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="parchains")
mcdcm3 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
    multiply="n", unchanged="np",
    partype="both")
}

## Using WinBUGS/OpenBUGS
library(R2WinBUGS)
data(schools)
dat <- list(J = nrow(schools), y = schools$estimate,
    sigma.y = schools$sd)
bugs.model <- function(){
       for (j in 1:J){
         y[j] ~ dnorm (theta[j], tau.y[j])
         theta[j] ~ dnorm (mu.theta, tau.theta)
         tau.y[j] <- pow(sigma.y[j], -2)
       }
       mu.theta ~ dnorm (0.0, 1.0E-6)
       tau.theta <- pow(sigma.theta, -2)
       sigma.theta ~ dunif (0, 1000)
     }
inits <- function(){
    list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100),
         sigma.theta=runif(1, 0, 100))
}
param <- c("mu.theta", "sigma.theta")
cl <- makePSOCKcluster(2)
if (.Platform$OS.type == "windows") {
sim2 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
    flavour="bugs", program="WinBUGS", multiply="J",
    n.iter=2000, n.thin=1)
summary(sim2)
}
sim3 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
    flavour="bugs", program="brugs", multiply="J",
    n.iter=2000, n.thin=1)
summary(sim3)
library(R2OpenBUGS)
sim4 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
    flavour="bugs", program="openbugs", multiply="J",
    n.iter=2000, n.thin=1)
summary(sim4)
stopCluster(cl)

## simulation for Poisson GLMM with inits
set.seed(1234)
n <- 5
beta <- c(2, -1)
sigma <- 0.1
alpha <- rnorm(n, 0, sigma)
x <- runif(n)
X <- model.matrix(~x)
linpred <- crossprod(t(X), beta) + alpha
Y <- rpois(n, exp(linpred))
## JAGS model as a function
jfun1 <- function() {
    for (i in 1:n) {
        Y[i] ~ dpois(lambda[i])
        log(lambda[i]) <- alpha[i] + inprod(X[i,], beta)
        alpha[i] ~ dnorm(0, 1/sigma^2)
    }
    for (j in 1:np) {
        beta[j] ~ dnorm(0, 0.001)
    }
    sigma ~ dlnorm(0, 0.001)
}
## data
jdata <- list(n = n, Y = Y, X = X, np = NCOL(X))
## inits with latent variable and parameters
ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X)))
## model arg is necessary as 1st arg,
## but not used when partype!=parchains
ifun <-
function(model, n.clones) {
    list(alpha=dclone(rep(0,n), n.clones),
        beta=c(0,0))
}
## make cluster
cl <- makePSOCKcluster(2)
## pass global n variable used in ifun to workers
tmp <- clusterExport(cl, "n")
## fit the model
jmod2 <- dc.parfit(cl, jdata, c("beta", "sigma"), jfun1, ini,
    n.clones = 1:2, multiply = "n", unchanged = "np",
    initsfun=ifun, partype="balancing")
stopCluster(cl)

## Using Stan
if (require(rstan)) {
    model <- custommodel("data {
          int<lower=0> N;
          vector[N] y;
          vector[N] x;
        }
        parameters {
          real alpha;
          real beta;
          real<lower=0> sigma;
        }
        model {
          alpha ~ normal(0,10);
          beta ~ normal(0,10);
          sigma ~ cauchy(0,5);
          for (n in 1:N)
            y[n] ~ normal(alpha + beta * x[n], sigma);
        }")
    N <- 100
    alpha <- 1
    beta <- -1
    sigma <- 0.5
    x <- runif(N)
    y <- rnorm(N, alpha + beta * x, sigma)
    dat <- list(N=N, y=y, x=x)
    params <- c("alpha", "beta", "sigma")
    ## compile on 1st time only
    fit0 <- stan.fit(dat, params, model)
    if (.Platform$OS.type != "windows") {
        ## utilize compiled fit0
        dcfit <- dc.parfit(cl=2, dat, params, model, n.clones=1:2,
            flavour="stan", multiply="N", fit=fit0)
        summary(dcfit)
        stan.model(dcfit)
        dcdiag(dcfit)
    }
}

## End(Not run)

datacloning/dclone documentation built on Sept. 29, 2024, 3:21 p.m.