Nothing
#' Outbreaker: disease outbreak reconstruction using genetic data
#'
#' \code{outbreaker} is a tool for the reconstruction of disease outbreaks
#' using pathogens genome sequences. It relies on a probabilistic model of
#' disease transmission which takes the genetic diversity, collection dates,
#' duration of pathogen colonization and time interval between cases into
#' account. It is embedded in a Bayesian framework which allows to estimate the
#' distributions of parameters of interest. It currently allows to estimate:
#' \itemize{ \item transmission trees \item dates of infection \item missing
#' cases in a chain of transmission \item mutation rates \item imported cases
#' \item (indirectly) effective reproduction numbers }
#'
#' The function \code{outbreaker} is the basic implementation of the model.
#' \code{outbreaker.parallel} allows to run several independent MCMC in
#' parallel across different cores / processors of the same computer. This
#' requires the base package \code{parallel}.
#'
#' The spatial module implemented in outbreaker is currently under development.
#' Please contact the author before using it.
#'
#' For more resources including tutorials, forums, etc., see:
#' \url{http://sites.google.com/site/therepiproject/r-pac/outbreaker}
#'
#' @export
#'
#' @aliases outbreaker outbreaker.parallel
#'
#' @rdname outbreaker
#'
#' @param dna the DNA sequences in \code{DNAbin} format (see
#' \code{\link[ape]{read.dna}} in the ape package); this can be imported from a
#' fasta file (extension .fa, .fas, or .fasta) using \code{adegenet}'s function
#' \link[adegenet]{fasta2DNAbin}; alternatively, a matrix of single characters strings,
#' in which case only the mutation model 1 is available.
#'
#' @param dates a vector indicating the collection dates, provided either as
#' integer numbers or in a usual date format such as \code{Date} or
#' \code{POSIXct} format. By convention, zero will indicate the oldest date.
#'
#' @param idx.dna an optional integer vector indicating to which case each dna
#' sequence in \code{dna} corresponds. Not required if each case has a
#' sequence, and the order of the sequences matches that of the cases.
#'
#' @param mut.model an integer indicating the mutational model to be used; 1:
#' one single mutation rate; 2: two rates, transitions (mu1) / transversions
#' (mu2); if 'dna' is a sequence of character strings (not a DNAbin object), only
#' the model 1 is available.
#'
#' @param spa.model an integer indicating the spatial model to be used. 0: no
#' spatial model (default). 1: exponential kernel (under development).
#'
#' @param w.dens a vector of numeric values indicating the generation time
#' distribution, reflecting the infectious potential of a case t=0, 1, 2, ...
#' time steps after infection. By convention, w.dens[1]=0, meaning that an
#' newly infected patient cannot be instantaneously infectious. If not
#' standardized, this distribution is rescaled to sum to 1.
#'
#' @param f.dens similar to \code{w.dens}, except that this is the distribution
#' of the colonization time, i.e. time interval during which the pathogen can
#' be sampled from the patient.
#'
#' @param dist.mat a matrix of pairwise spatial distances between the cases.
#'
#' @param init.tree the tree used to initialize the MCMC. Can be either a
#' character string indicating how this tree should be computed, or a vector of
#' integers corresponding to the tree itself, where the i-th value corresponds
#' to the index of the ancestor of 'i' (i.e., \code{init.tree[i]} is the
#' ancestor of case \code{i}). Accepted character strings are "seqTrack" (uses
#' seqTrack output as initialize tree), "random" (ancestor randomly selected
#' from preceding cases), and "star" (all cases coalesce to the first case).
#' Note that for SeqTrack, all cases should have been sequenced.
#'
#' @param init.kappa as \code{init.tree}, but values indicate the number of
#' generations between each case and its most recent sampled ancestor.
#'
#' @param n.iter an integer indicating the number of iterations in the MCMC,
#' including the burnin period; defaults to \code{100,000}.
#'
#' @param sample.every an integer indicating the frequency at which to sample
#' from the MCMC, defaulting to 500 (i.e., output to file every 500
#' iterations).
#'
#' @param tune.every an integer indicating the frequency at which proposal
#' distributions are tuned, defaulting to 500 (i.e., tune proposal distribution
#' every 500 iterations).
#'
#' @param burnin an integer indicating the number of iterations for the burnin
#' period, after which the chains are supposed to have mixed; estimated values
#' of parameter are only relevant after the burnin period. Used only when
#' imported cases are automatically detected.
#'
#' @param import.method a character string indicating which method to use for
#' detecting imported cases; available choices are 'gen' (based on genetic
#' likelihood), 'full' (based on full likelihood), and 'none' (no imported case
#' detection).
#'
#' @param find.import.n an integer indicating how many chains should be used to
#' determine imported cases; note that this corresponds to chains that are
#' output after the burnin, so that a total of (burnin +
#' output.every*find.import.n) chains will be used in the prior run to
#' determine imported cases. Defaults to \code{50}.
#'
#' @param pi.prior1,pi.prior2 two numeric values being the parameters of the
#' Beta distribution used as a prior for \eqn{\pi}. This prior is Beta(10,1) by
#' default, indicating that a majority of cases are likely to have been
#' observed. Use Beta(1,1) for a flat prior.
#'
#' @param init.mu1,init.mu2 initial values for the mutation rates (mu1:
#' transitions; mu2: transversions).
#'
#' @param init.spa1 initial values of the spatial parameter.
#'
#' @param spa1.prior parameters of the prior distribution for the spatial
#' parameters. In the spatial model 1, \code{spa1.prior} is the mean of an
#' exponential distribution.
#'
#' @param move.mut,move.pi,move.spa logicals indicating whether the named items
#' should be estimated ('moved' in the MCMC), or not, all defaulting to TRUE.
#' \code{move.mut} handles both mutation rates.
#'
#' @param move.ances,move.kappa,move.Tinf vectors of logicals of length 'n'
#' indicating for which cases different components should be moved during the
#' MCMC.
#'
#' @param outlier.threshold a numeric value indicating the threshold for
#' detecting low likelihood values corresponding to imported cases. Outliers
#' have a likelihood \code{outlier.threshold} smaller than the average.
#'
#' @param max.kappa an integer indicating the maximum number of generations
#' between a case and its most recent sampled ancestor; defaults to 10.
#'
#' @param quiet a logical indicating whether messages should be displayed on
#' the screen.
#'
#' @param res.file.name a character string indicating the name of the file used
#' to store MCMC outputs.
#'
#' @param tune.file.name a character string indicating the name of the file
#' used to store MCMC tuning outputs.
#'
#' @param seed an integer used to set the random seed of the C procedures.
#'
#' @param n.runs an integer indicating the number of independent chains to run,
#' either in parallel (if \code{parallel} is used), or serially (otherwise).
#'
#' @param parallel a logical indicating whether the package \code{parallel}
#' should be used to run parallelized computations; by default, it is used if
#' available.
#'
#' @param n.cores an integer indicating the number of cores to be used for
#' parallelized computations; if NULL (default value), then up to 6 cores are
#' used, depending on availability.
#'
#' @return Both procedures return a list with the following components:
#' \itemize{ \item chains: a data.frame containing MCMC outputs (which are also
#' stored in the file indicated in \code{res.file.name}).
#'
#' \item collec.dates: (data) the collection dates.
#'
#' \item w: (data) the generation time distribution (argument \code{w.dens})
#'
#' \item f: (data) the distribution of the time to collection (argument
#' \code{f.dens})
#'
#' \item D: a matrix of genetic distances (in number of mutations) between all
#' pairs of sequences.
#'
#' \item idx.dna: (data) the index of the case each dna sequence corresponds to
#'
#' \item tune.end: an integer indicating at which iteration the proposal
#' auto-tuning procedures all stopped.
#'
#' \item find.import: a logical indicating if imported cases were to be
#' automatically detected.
#'
#' \item burnin: an integer indicating the pre-defined burnin, used when
#' detecting imported cases.
#'
#' \item find.import.at: an integer indicating at which iteration of the
#' preliminary MCMC imported cases were detected.
#'
#' \item n.runs: the number of independent runs used.
#'
#' \item call: the matched call. }
#' @author Thibaut Jombart (\email{t.jombart@@imperial.ac.uk})
#' @seealso \itemize{ \item \link{plotChains} to visualize MCMC chains.
#'
#' \item \link{transGraph} and \link{get.tTree} to represent transmission
#' trees.
#'
#' \item \link{get.R} and \link{get.Rt} to get reproduction numbers
#' distributions.
#'
#' \item \link{get.incid} to get estimates of incidence.
#'
#' \item \link{get.mu} to get the mutation rate distribution.
#'
#' \item \link{simOutbreak} to simulate outbreaks.
#'
#' \item \link{selectChains} to select chains from parallel runs which
#' converged towards different posterior modes.
#'
#' \item \link{fakeOutbreak}, a toy dataset used to illustrate the method.
#'
#' \item For more resources including tutorials, forums, etc., see:
#' \url{http://sites.google.com/site/therepiproject/r-pac/outbreaker}
#'
#' }
#' @references Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C and Ferguson
#' N (accepted). Bayesian reconstruction of disease outbreaks by combining
#' epidemiologic and genomic data. PLoS Computational Biology.
#' @examples
#'
#'
#' ## EXAMPLE USING TOYOUTBREAK ##
#' ## LOAD DATA, SET RANDOM SEED
#' data(fakeOutbreak)
#' attach(fakeOutbreak)
#'
#' ## VISUALIZE DYNAMICS
#' matplot(dat$dynam, type="o", pch=20, lty=1,
#' main="Outbreak dynamics", xlim=c(0,28))
#' legend("topright", legend=c("S","I","R"), lty=1, col=1:3)
#'
#' ## VISUALIZE TRANSMISSION TREE
#' plot(dat, annot="dist", main="Data - transmission tree")
#' mtext(side=3, "arrow annotations are numbers of mutations")
#'
#'
#' \dontrun{
#' ## RUN OUTBREAKER - PARALLEL VERSION
#' ## (takes < 1 min))
#' set.seed(1)
#' res <- outbreaker.parallel(n.runs=4, dna=dat$dna,
#' dates=collecDates,w.dens=w, n.iter=5e4)
#' }
#'
#'
#' ## ASSESS CONVERGENCE OF CHAINS
#' plotChains(res)
#' plotChains(res, burnin=2e4)
#'
#' ## REPRESENT POSTERIOR ANCESTRIES
#' transGraph(res, annot="", main="Posterior ancestries", thres=.01)
#'
#' ## GET CONSENSUS ANCESTRIES
#' tre <- get.tTree(res)
#' plot(tre, annot="", main="Consensus ancestries")
#'
#' ## SHOW DISCREPANCIES
#' col <- rep("lightgrey", 30)
#' col[which(dat$ances != tre$ances)] <- "pink"
#' plot(tre, annot="", vertex.color=col, main="Consensus ancestries")
#' mtext(side=3, text="cases with erroneous ancestries in pink")
#'
#' ## GET EFFECTIVE REPRODUCTION OVER TIME
#' get.Rt(res)
#'
#' ## GET INDIVIDUAL EFFECTIVE REPRODUCTION
#' head(get.R(res))
#' boxplot(get.R(res), col="grey", xlab="Case",
#' ylab="Effective reproduction number")
#'
#' ## GET MUTATION RATE PER TIME UNIT
#' ## per genome
#' head(get.mu(res))
#'
#' ## per nucleotide
#' mu <- get.mu(res, genome.size=1e4)
#' head(mu)
#'
#' summary(mu)
#' hist(mu, border="lightgrey", col="grey", xlab="Mutation per day and nucleotide",
#' main="Posterior distribution of mutation rate")
#'
#' detach(fakeOutbreak)
#'
#'
#'
outbreaker <- function(dna=NULL, dates, idx.dna=NULL,
mut.model=1, spa.model=0,
w.dens, f.dens=w.dens,
dist.mat=NULL,
init.tree=c("seqTrack","random","star"),
init.kappa=NULL, init.mu1=NULL, init.mu2=init.mu1, init.spa1=NULL,
n.iter=1e5, sample.every=500, tune.every=500,
burnin=2e4, import.method=c("genetic","full","none"),
find.import.n=50,
pi.prior1=10, pi.prior2=1, spa1.prior=1,
move.mut=TRUE, move.ances=TRUE, move.kappa=TRUE,
move.Tinf=TRUE, move.pi=TRUE, move.spa=TRUE,
outlier.threshold = 5, max.kappa=10,
quiet=TRUE, res.file.name="chains.txt",
tune.file.name="tuning.txt", seed=NULL){
## CHECKS ##
## if(!require(ape)) stop("the ape package is required but not installed")
## RE-ORDERING OF IMPORT METHOD TO MATCH C SIDE:
## none:0L
## genetic: 1L
## full: 2L
import.method <- match.arg(import.method)
import.method <- as.integer(match(import.method, c("none", "genetic","full")))-1L
## HANDLE MISSING DNA ##
useDna <- !is.null(dna)
if(is.null(dna)){
dna <- as.DNAbin(matrix('a',ncol=10,nrow=length(dates)))
move.mut <- FALSE
mut.model <- 0L
if(is.character(init.tree) && match.arg(init.tree)=="seqTrack") init.tree <- "star"
init.mu1 <- init.mu2 <-0
init.gamma <- 1
}
## HANDLE DNA / CHARACTER DATA
if(inherits(dna, "DNAbin") && !is.matrix(dna)) dna <- as.matrix(dna)
if(!inherits(dna, "DNAbin") && !(is.matrix(dna) && mode(dna)=="character")) stop("dna must be a DNAbin object or a character matrix.")
if(!inherits(dna, "DNAbin") && mut.model > 1L){
warning("Character sequences provided - forcing mutation model to 1 (Hamming)")
mut.model <- 1L
}
if(is.character(dates)) stop("dates are characters; they must be integers or dates with POSIXct format (see ?as.POSIXct)")
if(is.character(init.tree)) {
init.tree <- match.arg(init.tree)
} else {
if(length(init.tree) != length(dates)) stop("inconvenient length for init.tree")
init.tree[is.na(init.tree)|init.tree<1] <- 0
if(max(init.tree)>length(dates)) stop("inconvenient values in init.tree (some indices > n)")
ances <- as.integer(init.tree-1) # translate indices on C scale (0:(n-1))
}
w.dens[1] <- 0 # force w_0 = 0
w.dens[w.dens<0] <- 0
if(sum(w.dens) <= 1e-14) stop("w.dens is zero everywhere")
if(!is.null(init.mu1) && init.mu1<0) stop("init.mu1 < 0")
if(!is.null(init.mu2) && init.mu2<0) stop("init.mu2 < 0")
## PROCESS INPUTS ##
## dna ##
n.seq <- as.integer(nrow(dna))
n.ind <- as.integer(length(dates))
n.nucl <- as.integer(ncol(dna))
dnaraw <- unlist(as.list(as.character(dna)),use.names=FALSE)
## if(n.ind != length(dates)) stop(paste("dna and dates have different number of individuals -",n.ind,"versus",length(dates)))
## handle dates ##
if(is.numeric(dates)){
if(sum(abs(dates-round(dates))>1e-15)) warning("dates have been rounded to nearest integers")
dates <- as.integer(round(dates))
}
if(inherits(dates, "POSIXct")){
dates <- difftime(dates, min(dates), units="days")
}
dates <- as.integer(dates)
## complete w.dens ##
max.range <- diff(range(dates))
## add an exponential tail summing to 1e-4 to 'w'
## to cover the span of the outbreak
## (avoids starting with -Inf temporal loglike)
if(length(w.dens)<max.range) {
length.to.add <- (max.range-length(w.dens)) + 10 # +10 to be on the safe side
val.to.add <- dexp(1:length.to.add, 1)
val.to.add <- 1e-4*(val.to.add/sum(val.to.add))
w.dens <- c(w.dens, val.to.add)
w.dens <- w.dens/sum(w.dens)
}
## w.trunc and f.trunc ##
w.trunc <- length(w.dens)
f.trunc <- length(f.dens)
## handle idx.dna ##
## need to go from: id of case for each sequence (idx.dna)
## to: position of the sequence in DNA matrix for each case
## -1 is used for missing sequences
if(is.null(idx.dna)) {
idx.dna <- 1:nrow(dna)
}
if(any(!idx.dna %in% 1:n.ind)) stop("DNA sequences provided for unknown cases (some idx.dna not in 1:n.ind)")
if(length(idx.dna)!=nrow(dna)) stop("length of idx.dna does not match the number of DNA sequences")
idx.dna.for.cases <- match(1:n.ind, idx.dna)
idx.dna.for.cases[is.na(idx.dna.for.cases)] <- 0
idx.dna.for.cases <- as.integer(idx.dna.for.cases-1) # for C
## check mutational model ##
## check model
mut.model <- as.integer(mut.model)
if(!mut.model %in% c(0L,1L,2L)) stop("unknown mutational model requested; accepted values are: 0, 1, 2")
## model 0: no evolution model
if(mut.model==0L){
init.gamma <- 1
init.mu1 <- 1
move.mut <- FALSE
}
## determine gamma
if(!is.null(init.mu1) && !is.null(init.mu2)){
init.gamma <- init.mu2/init.mu1
if(is.na(init.gamma) || is.infinite(init.gamma)) init.gamma <- 1 # in case rates are both 0
} else{
init.gamma <- 1
}
## force gamma to 1 for model 1
if(mut.model==1L){
init.gamma <- 1
}
## check generation time function ##
w.dens <- as.double(w.dens)
w.dens <- w.dens/sum(w.dens)
if(any(is.na(w.dens))) stop("NAs in w.dens after normalization")
w.trunc <- as.integer(w.trunc)
## check collection time function ##
f.dens <- as.double(f.dens)
f.dens <- f.dens/sum(f.dens)
if(any(is.na(f.dens))) stop("NAs in f.dens after normalization")
f.trunc <- as.integer(f.trunc)
## check spatial distances ##
if(!is.null(dist.mat)){
if(!inherits(dist.mat,"matrix")) dist.mat <- as.matrix(dist.mat)
if(nrow(dist.mat) != ncol(dist.mat)) stop("matrix of distances (dist.mat) is not square")
if(nrow(dist.mat) != length(dates)) stop("wrong dimension for the matrix of distances")
if(any(is.na(dist.mat))) stop("NAs in the distance matrix")
} else {
if(spa.model>0) stop("spatial model requested but dist.mat not provided")
dist.mat <- matrix(0, ncol=length(dates), nrow=length(dates))
spa.model <- 0L
}
## check spatial model ##
spa.model <- as.integer(spa.model)
if(!spa.model %in% c(0L, 1L, 2L)) stop("unknown spatial model requested; accepted values are: 0, 1, 2")
## model 0: no spatial info
if(spa.model == 0L) {
dist.mat <- matrix(0, ncol=length(dates), nrow=length(dates))
init.spa1 <- init.spa2 <- 0
}
## model 1: normal dispersal
if(spa.model > 0L) {
if(is.null(init.spa1)) init.spa1 <- 1
## if(is.null(init.spa2)) init.spa2 <- 0
init.spa2 <- 0
spa1.prior <- max(0.0, spa1.prior)
}
## model 2: stratified dispersal
if(spa.model == 2L){
stop("Only available spatial models are 0 (none) and 1 (exponential diffusion)")
## if(is.null(locations)) stop("Spatial model 2 needs locations for stratified dispersal")
## if(length(locations)!=length(dates)) stop("wrong length for argument 'locations'")
## if(is.character(locations)) locations <- factor(locations)
## locations <- as.integer(locations)
}
## init.kappa ##
## if NULL, will be ML assigned (code is kappa_i<0)
if(is.null(init.kappa)) init.kappa <- rep(0L,n.ind)
init.kappa <- as.integer(rep(init.kappa, length=n.ind)) #recycle
## find initial tree ##
## get temporal ordering constraint:
## canBeAnces[i,j] is 'i' can be ancestor of 'j'
canBeAnces <- outer(dates,dates,FUN="<") # strict < is needed as we impose w(0)=0
diag(canBeAnces) <- FALSE
if(is.character(init.tree)){
## check no missing sequences for seqTrack
if(init.tree=="seqTrack" && !all(1:n.ind %in% idx.dna)) {
warning("Can't use seqTrack initialization with missing DNA sequences - using a star-like tree")
init.tree <- "star"
}
## check 'dna' is DNAbin for seqTrack
if(init.tree=="seqTrack" && !inherits(dna,"DNAbin")) {
warning("Can't use seqTrack initialization with character (non-DNAbin) sequences - using a star-like tree")
init.tree <- "star"
}
## seqTrack init
if(init.tree=="seqTrack"){
D <- as.matrix(dist.dna(dna, model="raw"))
D[!canBeAnces] <- 1e15
ances <- apply(D,2,which.min)-1 # -1 for compatibility with C
ances[dates==min(dates)] <- -1 # unknown ancestor
ances <- as.integer(ances)
}
## random init
if(init.tree=="random"){
ances <- apply(canBeAnces, 2, function(e) ifelse(length(which(e))>0, sample(which(e),1), NA) )
ances <- ances-1
ances[is.na(ances)] <- -1L
ances <- as.integer(ances)
}
## star-shape init
if(init.tree=="star"){
ances <- rep(which.min(dates), length(dates))
ances[dates==min(dates)] <- 0
ances <- as.integer(ances-1) # put on C scale
}
## ## no ancestry init
## if(init.tree=="none"){
## ances <- as.integer(rep(-1,length(dates)))
## }
}
## handle seed ##
if(is.null(seed)){
seed <- as.integer(runif(1,min=0,max=2e9))
}
## handle import.method ##
if(mut.model==0L && import.method==1L) import.method <- 2L
## handle find.import ##
find.import <- import.method > 0L
if(find.import){
find.import.n <- max(find.import.n,30) # import at least based on 30 values
find.import.at <- as.integer(round(burnin + find.import.n*sample.every))
if(find.import.at>=n.iter) stop(paste("n.iter (", n.iter, ") is less than find.import.at (", find.import.at,")", sep=""))
} else {
find.import.at <- as.integer(0)
}
## coerce type for remaining arguments ##
n.iter <- as.integer(n.iter)
sample.every <- as.integer(sample.every)
tune.every <- as.integer(tune.every)
pi.prior1 <- as.double(pi.prior1)
pi.prior2 <- as.double(pi.prior2)
## phi.param1 <- as.double(phi.param1)
## phi.param2 <- as.double(phi.param2)
phi.param1 <- phi.param2 <- as.double(1)
if(is.null(init.mu1)) {
init.mu1 <- 0.5/ncol(dna)
}
init.mu1 <- as.double(init.mu1)
init.gamma <- as.double(init.gamma)
init.spa1 <- as.double(init.spa1)
## init.spa2 <- as.double(init.spa2)
init.spa2 <- as.double(1)
spa1.prior <- as.double(spa1.prior)
## spa2.prior <- as.double(spa2.prior)
spa2.prior <- as.double(1)
move.mut <- as.integer(move.mut)
move.ances <- as.integer(rep(move.ances, length=n.ind))
move.kappa <- as.integer(rep(move.kappa, length=n.ind))
move.Tinf <- as.integer(move.Tinf)
move.pi <- as.integer(move.pi)
## move.phi <- as.integer(move.phi)
move.phi <- 0L
move.spa <- as.integer(move.spa)
quiet <- as.integer(quiet)
res.file.name <- as.character(res.file.name)[1]
tune.file.name <- as.character(tune.file.name)[1]
burnin <- as.integer(burnin)
outlier.threshold <- as.double(outlier.threshold)
max.kappa <- as.integer(max.kappa)
##locations <- as.integer(locations)
locations <- rep(0L, length(dates))
## create empty output vector for genetic distances ##
dna.dist <- integer(n.ind*(n.ind-1)/2)
stopTuneAt <- integer(1)
temp <- .C("R_outbreaker",
dnaraw, dates, n.ind, n.seq, n.nucl, idx.dna.for.cases, mut.model,
w.dens, w.trunc, f.dens, f.trunc,
dist.mat, locations, spa.model,
ances, init.kappa, n.iter, sample.every, tune.every,
pi.prior1, pi.prior2, phi.param1, phi.param2, init.mu1, init.gamma,
init.spa1, init.spa2, spa1.prior, spa2.prior,
move.mut, move.ances, move.kappa, move.Tinf,
move.pi, move.phi, move.spa,
import.method, find.import.at, burnin, outlier.threshold,
max.kappa, quiet,
dna.dist, stopTuneAt, res.file.name, tune.file.name, seed,
PACKAGE="outbreaker")
D <- temp[[43]]
D[D<0] <- NA
stopTuneAt <- temp[[44]]
cat("\nComputations finished.\n\n")
## make D a 'dist' object ##
attr(D,"Size") <- n.ind
attr(D,"Diag") <- FALSE
attr(D,"Upper") <- FALSE
class(D) <- "dist"
## BUILD OUTPUT ##
## read table
chains <- read.table(res.file.name, header=TRUE, stringsAsFactors=FALSE,
colClasses=c("integer", rep("numeric",7+n.ind*2)))
chains$run <- rep(1, nrow(chains))
call <- match.call()
res <- list(chains=chains, collec.dates=dates, w=w.dens[1:w.trunc], f=f.dens[1:f.trunc], D=D, idx.dna=idx.dna, tune.end=stopTuneAt,
burnin=burnin, import.method=import.method, find.import.at=find.import.at, n.runs=1, call=call)
return(res)
} # end outbreaker
#' @rdname outbreaker
#' @export
outbreaker.parallel <- function(n.runs, parallel=TRUE, n.cores=NULL,
dna=NULL, dates, idx.dna=NULL, mut.model=1, spa.model=0,
w.dens, f.dens=w.dens,
dist.mat=NULL,
init.tree=c("seqTrack","random","star"),
init.kappa=NULL,
init.mu1=NULL, init.mu2=init.mu1, init.spa1=NULL,
n.iter=1e5, sample.every=500, tune.every=500,
burnin=2e4, import.method=c("genetic","full","none"),
find.import.n=50,
pi.prior1=10, pi.prior2=1, spa1.prior=1,
move.mut=TRUE, move.ances=TRUE, move.kappa=TRUE,
move.Tinf=TRUE, move.pi=TRUE, move.spa=TRUE,
outlier.threshold = 5, max.kappa=10,
quiet=TRUE, res.file.name="chains.txt", tune.file.name="tuning.txt", seed=NULL){
## SOME CHECKS ##
if(parallel && is.null(n.cores)){
n.cores <- detectCores()
n.cores <- min(n.cores, 6)
}
## GET FILE NAMES ##
res.file.names <- paste("run", 1:n.runs, "-", res.file.name, sep="")
tune.file.names <- paste("run", 1:n.runs, "-", tune.file.name, sep="")
## HANDLE SEED ##
if(is.null(seed)){
seed <- as.integer(runif(n.runs,min=0,max=2e9))
} else {
seed <- rep(seed, length=n.runs)
}
## COMPUTATIONS ##
if(parallel){
## create cluster ##
clust <- makeCluster(n.cores)
## load outbreaker for each child ##
clusterEvalQ(clust, library(outbreaker))
## transfer data onto each child ##
listArgs <- c("dna", "dates", "idx.dna", "mut.model", "spa.model", "w.dens", "f.dens", "dist.mat", "init.tree", "init.kappa", "n.iter",
"sample.every", "tune.every", "burnin", "import.method", "find.import.n", "pi.prior1", "pi.prior2", "init.mu1", "init.mu2",
"init.spa1", "move.mut", "spa1.prior", "move.mut", "move.ances", "move.kappa", "move.Tinf", "move.pi", "move.spa",
"outlier.threshold", "max.kappa", "res.file.names", "tune.file.names", "seed")
clusterExport(clust, listArgs, envir=environment())
## set calls to outbreaker on each child ##
res <- parLapply(clust, 1:n.runs, function(i) outbreaker(dna=dna, dates=dates, idx.dna=idx.dna,
mut.model=mut.model, spa.model=spa.model,
w.dens=w.dens,
f.dens=f.dens,
dist.mat=dist.mat, ## locations=locations,
init.tree=init.tree, init.kappa=init.kappa,
n.iter=n.iter, sample.every=sample.every,
tune.every=tune.every, burnin=burnin,
import.method=import.method,
find.import.n=find.import.n,
pi.prior1=pi.prior1, pi.prior2=pi.prior2,
spa1.prior=spa1.prior,
init.mu1=init.mu1, init.mu2=init.mu2, init.spa1=init.spa1,
move.mut=move.mut, move.ances=move.ances, move.kappa=move.kappa,
move.Tinf=move.Tinf, move.pi=move.pi, move.spa=move.spa,
outlier.threshold = outlier.threshold, max.kappa=max.kappa,
quiet=TRUE, res.file.name=res.file.names[i],
tune.file.name=tune.file.names[i], seed=seed[i]))
## close parallel processes ##
stopCluster(clust)
## Version with mclapply - doesn't work on windows ##
## res <- mclapply(1:n.runs, function(i) outbreaker(dna=dna, dates=dates, idx.dna=idx.dna, w.dens=w.dens, w.trunc=w.trunc,
## init.tree=init.tree, init.kappa=init.kappa,
## n.iter=n.iter, sample.every=sample.every,
## tune.every=tune.every, burnin=burnin,
## find.import=find.import, find.import.n=find.import.n,
## pi.prior1=pi.prior1, pi.prior2=pi.prior2,
## init.mu1=init.mu1, init.mu2=init.mu2,
## move.mut=move.mut, move.ances=move.ances, move.kappa=move.kappa,
## move.Tinf=move.Tinf, move.pi=move.pi,
## quiet=TRUE, res.file.name=res.file.names[i],
## tune.file.name=tune.file.names[i], seed=seed[i]),
## mc.cores=n.cores, mc.silent=FALSE, mc.cleanup=TRUE, mc.preschedule=TRUE, mc.set.seed=TRUE)
} else {
res <- lapply(1:n.runs, function(i) outbreaker(dna=dna, dates=dates, idx.dna=idx.dna,
mut.model=mut.model, spa.model=spa.model,
w.dens=w.dens,
f.dens=f.dens,
dist.mat=dist.mat,
init.tree=init.tree, init.kappa=init.kappa,
n.iter=n.iter, sample.every=sample.every,
tune.every=tune.every, burnin=burnin,
import.method=import.method,
find.import.n=find.import.n,
pi.prior1=pi.prior1, pi.prior2=pi.prior2,
spa1.prior=spa1.prior,
init.mu1=init.mu1, init.mu2=init.mu2, init.spa1=init.spa1,
move.mut=move.mut, move.ances=move.ances, move.kappa=move.kappa,
move.Tinf=move.Tinf, move.pi=move.pi, move.spa=move.spa,
outlier.threshold = outlier.threshold, max.kappa=max.kappa,
quiet=TRUE, res.file.name=res.file.names[i],
tune.file.name=tune.file.names[i], seed=seed[i]))
}
## MERGE RESULTS ##
res.old <- res
res <- res[[1]]
res$tune.end <- max(sapply(res.old, function(e) e$tune.end))
res$chains <- Reduce(rbind, lapply(res.old, function(e) e$chains))
res$chains$run <- factor(rep(1:n.runs, each=nrow(res.old[[1]]$chains)))
res$n.runs <- n.runs
res$call <- match.call()
## RETURN RESULTS ##
return(res)
} # end outbreaker.parallel
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.