dc.parfit | R Documentation |
Iterative model fitting on parallel workers with different numbers of clones.
dc.parfit(cl, data, params, model, inits, n.clones,
multiply=NULL, unchanged=NULL,
update = NULL, updatefun = NULL, initsfun = NULL,
flavour = c("jags", "bugs", "stan"), n.chains = 3,
partype=c("balancing", "parchains", "both"),
return.all=FALSE, check.nclones=TRUE, ...)
cl |
A cluster object created by |
data |
A named list (or environment) containing the data. |
params |
Character vector of parameters to be sampled.
It can be a list of 2 vectors, 1st element
is used as parameters to monitor, the 2nd is used
as parameters to use in calculating the data cloning
diagnostics. ( |
model |
Character string (name of the model file), a function containing the
model, or a or |
inits |
Optional specification of initial values in the form of a list
or a function (see Initialization at |
n.clones |
An integer vector containing the numbers of clones to use iteratively. |
multiply |
Numeric or character index for list element(s) in the |
unchanged |
Numeric or character index for list element(s) in the |
update |
Numeric or character index for list element(s) in the |
updatefun |
A function to use for updating |
initsfun |
A function to use for generating initial values, |
flavour |
If |
partype |
Type of parallel workload distribution, see Details. |
n.chains |
Number of chains to generate. |
return.all |
Logical. If |
check.nclones |
Logical, whether to check and ensure that values of |
... |
Other values supplied to |
The dc.parfit
is a parallel computing version of
dc.fit
.
After parallel computations, temporary objects passed to workers and
the dclone package is cleaned up. It is not guaranteed
that objects already on the workers and independently loaded packages
are not affected. Best to start new instances beforehand.
partype="balancing"
distributes each model corresponding
to values in n.clones
as jobs to workers according to size
balancing (see parDosa
). partype="parchains"
makes repeated calls
to jags.parfit
for each value in n.clones
.
partype="both"
also calls jags.parfit
but
each chain of each cloned model is distributed as separate job to the
workers.
The vector n.clones
is used to determine size balancing.
If load balancing is also desired
besides of size balancing (e.g. due to unequal performance of
the workers, the option "dclone.LB"
should be set to TRUE
(by using options("dclone.LB" = TRUE)
).
By default, the "dclone.LB"
option is FALSE
for reproducibility reasons.
Some arguments from dc.fit
are not available in parallel
version (update
, updatefun
, initsfun
)
when size balancing is used
(partype
is "balancing"
or "both"
).
These arguments are evaluated only when partype="parchains"
.
Size balancing is recommended if n.clones
is a relatively long
vector, while parallel chains might be more efficient when
n.clones
has few elements.
For efficiency reasons, a combination of the two
(partype="both"
) is preferred if cluster size allows it.
Additionally loaded JAGS modules (e.g. "glm"
) need to be loaded
to the workers.
An object inheriting from the class 'mcmc.list'.
Peter Solymos
Lele, S.R., B. Dennis and F. Lutscher, 2007. Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods. Ecology Letters 10, 551–563.
Lele, S. R., K. Nadeem and B. Schmuland, 2010. Estimability and likelihood inference for generalized linear mixed models using data cloning. Journal of the American Statistical Association 105, 1617–1625.
Solymos, P., 2010. dclone: Data Cloning in R. The R Journal 2(2), 29–37. URL: https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf
Sequential version: dc.fit
.
Optimizing the number of workers: clusterSize
,
plotClusterSize
.
Underlying functions: jags.fit
,
bugs.fit
.
## Not run:
set.seed(1234)
n <- 20
x <- runif(n, -1, 1)
X <- model.matrix(~x)
beta <- c(2, -1)
mu <- crossprod(t(X), beta)
Y <- rpois(n, exp(mu))
glm.model <- function() {
for (i in 1:n) {
Y[i] ~ dpois(lambda[i])
log(lambda[i]) <- inprod(X[i,], beta[1,])
}
for (j in 1:np) {
beta[1,j] ~ dnorm(0, 0.001)
}
}
dat <- list(Y=Y, X=X, n=n, np=ncol(X))
k <- 1:3
## sequential version
dcm <- dc.fit(dat, "beta", glm.model, n.clones=k, multiply="n",
unchanged="np")
## parallel version
cl <- makePSOCKcluster(3)
pdcm1 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="balancing")
pdcm2 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="parchains")
pdcm3 <- dc.parfit(cl, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="both")
summary(dcm)
summary(pdcm1)
summary(pdcm2)
summary(pdcm3)
stopCluster(cl)
## multicore type forking
if (.Platform$OS.type != "windows") {
mcdcm1 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="balancing")
mcdcm2 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="parchains")
mcdcm3 <- dc.parfit(3, dat, "beta", glm.model, n.clones=k,
multiply="n", unchanged="np",
partype="both")
}
## Using WinBUGS/OpenBUGS
library(R2WinBUGS)
data(schools)
dat <- list(J = nrow(schools), y = schools$estimate,
sigma.y = schools$sd)
bugs.model <- function(){
for (j in 1:J){
y[j] ~ dnorm (theta[j], tau.y[j])
theta[j] ~ dnorm (mu.theta, tau.theta)
tau.y[j] <- pow(sigma.y[j], -2)
}
mu.theta ~ dnorm (0.0, 1.0E-6)
tau.theta <- pow(sigma.theta, -2)
sigma.theta ~ dunif (0, 1000)
}
inits <- function(){
list(theta=rnorm(nrow(schools), 0, 100), mu.theta=rnorm(1, 0, 100),
sigma.theta=runif(1, 0, 100))
}
param <- c("mu.theta", "sigma.theta")
cl <- makePSOCKcluster(2)
if (.Platform$OS.type == "windows") {
sim2 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
flavour="bugs", program="WinBUGS", multiply="J",
n.iter=2000, n.thin=1)
summary(sim2)
}
sim3 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
flavour="bugs", program="brugs", multiply="J",
n.iter=2000, n.thin=1)
summary(sim3)
library(R2OpenBUGS)
sim4 <- dc.parfit(cl, dat, param, bugs.model, n.clones=1:2,
flavour="bugs", program="openbugs", multiply="J",
n.iter=2000, n.thin=1)
summary(sim4)
stopCluster(cl)
## simulation for Poisson GLMM with inits
set.seed(1234)
n <- 5
beta <- c(2, -1)
sigma <- 0.1
alpha <- rnorm(n, 0, sigma)
x <- runif(n)
X <- model.matrix(~x)
linpred <- crossprod(t(X), beta) + alpha
Y <- rpois(n, exp(linpred))
## JAGS model as a function
jfun1 <- function() {
for (i in 1:n) {
Y[i] ~ dpois(lambda[i])
log(lambda[i]) <- alpha[i] + inprod(X[i,], beta)
alpha[i] ~ dnorm(0, 1/sigma^2)
}
for (j in 1:np) {
beta[j] ~ dnorm(0, 0.001)
}
sigma ~ dlnorm(0, 0.001)
}
## data
jdata <- list(n = n, Y = Y, X = X, np = NCOL(X))
## inits with latent variable and parameters
ini <- list(alpha=rep(0,n), beta=rep(0, NCOL(X)))
## model arg is necessary as 1st arg,
## but not used when partype!=parchains
ifun <-
function(model, n.clones) {
list(alpha=dclone(rep(0,n), n.clones),
beta=c(0,0))
}
## make cluster
cl <- makePSOCKcluster(2)
## pass global n variable used in ifun to workers
tmp <- clusterExport(cl, "n")
## fit the model
jmod2 <- dc.parfit(cl, jdata, c("beta", "sigma"), jfun1, ini,
n.clones = 1:2, multiply = "n", unchanged = "np",
initsfun=ifun, partype="balancing")
stopCluster(cl)
## Using Stan
if (require(rstan)) {
model <- custommodel("data {
int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0,10);
beta ~ normal(0,10);
sigma ~ cauchy(0,5);
for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
}")
N <- 100
alpha <- 1
beta <- -1
sigma <- 0.5
x <- runif(N)
y <- rnorm(N, alpha + beta * x, sigma)
dat <- list(N=N, y=y, x=x)
params <- c("alpha", "beta", "sigma")
## compile on 1st time only
fit0 <- stan.fit(dat, params, model)
if (.Platform$OS.type != "windows") {
## utilize compiled fit0
dcfit <- dc.parfit(cl=2, dat, params, model, n.clones=1:2,
flavour="stan", multiply="N", fit=fit0)
summary(dcfit)
stan.model(dcfit)
dcdiag(dcfit)
}
}
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.