(1) Multi-core computing using the parallel package
One can easily generate prallel chains at multiple cores using the parallel
package. This can be conviniently done generating a wrapper to
BGLR
.
BGLR.wrap=function(task,seeds,...){
seed=seeds[task]
set.seed(seed)
fm=BGLR(saveAt=paste0(task,'_'),...)
return(list(fm=fm,task=task,seed=seed))
}
Now we can run multiple chains in parallel.
library(parallel)
library(BGLR)
seeds=c(100,110,120)
data(wheat)
X=scale(wheat.X)
y=wheat.Y[,1]
ETA=list(list(X=X,model='BRR'))
fmList=mclapply(FUN=BGLR.wrap,seeds=seeds,X=1:3,mc.cores=2,nIter=6000,burnIn=1000,verbose=F,y=y,ETA=ETA)
fmList[[1]]$fm$varE
Using a similar approach we can conduct a cross-validation in parallel.
(2) Parallel computation in clusters
In a High-performance computing cluster (HPCC) one can easily run multiple jobs in parallel at different nodes of the cluster.
In high-dimensional models (e.g., hundereds of thousans of predictors) the burn-in period can be long. To overcome this problem we developed an experimental version (BGLR2
) which allows users to:
- save the internal environment to a file
- run BGLR using a saved environment.
These tools can be used to efficiently collect large numbers of samples at different nodes of an HPCC. The following example
illsutrates some of the features of BGLR2
, relative to BGLR these are some of the additional arguments:
- saveEnv
(TRUE/FALSE) if TRUE a binary file containing a snapshot of the environment right at the end of the sampler is generated.
- BGLR_ENV
(character) a path and the name of a file containing a snapshot of a BGLR environment. If provided this environment is used to run a sampler. Since this environment contains all the elements of model specification (y, ETA, etc. ) these arguments do not need to be provided. However, a few arguments saveAt
, nIter
, burnIn
, thin
, rmExistingFiles
, are over-written by the call (that is, BGLR uses the values provided in the call and not the ones saved in the environment.
- newChain
(TRUE/FALSE) if FALSE the chain is continued (with the seed saved in BGLR_ENV
) and samples are appended to already existing files. Otherwise, a new chain is generated. In this case the strarting values are the last ones collected in the run that generated BGLR_ENV
, however new seed and new output files are generated.
(1) Saving a snapshot of the environment at the end of the sampler.
rm(list=ls())
dir.create('~/testBGLR2')
setwd('~/testBGLR2')
library(devtools)
install_git('https://github.com/gdlc/BGLR-R')
library(BGLR)
data(wheat)
X=wheat.X[,1:100]
set.seed(1203)
fm1a=BGLR2(y=wheat.Y[,1],ETA=list(list(X=X,model='BayesB',saveEffects=TRUE)),
saveEnv=TRUE,saveAt='firstRun_',nIter=12000,burnIn=2000,thin=1)
list.files()
Let's now recover BGLR from sleep and run additional iterations
fm1b=BGLR2(BGLR_ENV='firstRun_BGLR_ENV.RData',nIter=10000,thin=1,burnIn=0,newChain=FALSE)
list.files()
varE1=scan('firstRun_vare.dat')
# Note that the number of samples in file are 22000=12000+1000
We can now check wheather the run in two-steps done above is equivalent to a single chain.
set.seed(1203)
fm2=BGLR(y=wheat.Y[,1],ETA=list(list(X=X,model='BayesB',saveEffects=TRUE)),saveAt='secondRun_',nIter=22000,burnIn=2000,thin=1)
c(fm1b$varE,fm2$varE)
plot(fm1b$yHat,fm2$yHat)
plot(scan('firstRun_mu.dat'),scan('secondRun_mu.dat'))
Note that you can also start a new chain (with different seed) using the saved environment. This will allow, for instance running parallel chains all with starting values provided by a saved environment that may have been run for burn-in.
fm1c=BGLR2(BGLR_ENV='firstRun_BGLR_ENV.RData',nIter=10000,thin=1,burnIn=0,newChain=TRUE,saveAt='thirdRun_')
list.files()
varE1=scan('firstRun_varE.dat')[-c(1:12000)]
varE3=scan('thirdRun_varE.dat')
plot(varE1,varE3)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.