Nothing
#' Construct a Fully Formatted NEXUS Script for Performing Tip-Dating Analyses With MrBayes
#'
#' This function is meant to expedite the creation of NEXUS files formatted
#' for performing tip-dating analyses in the popular phylogenetics software \emph{MrBayes},
#' particularly clock-less tip-dating analyses executed with 'empty' morphological matrices
#' (i.e. where all taxa are coded for a single missing character), although a pre-existing
#' morphological matrix can also be input by the user (see argument \code{origNexusFile}).
#' Under some options, this pre-existing matrix may be edited by this function.
#' The resulting full NEXUS script is output as a set of character strings either
#' printed to the R console, or output to file which is then overwritten.
#'
#' @details
#' Users must supply a data set of tip ages (in various formats),
#' which are used to construct age calibrations commands on the tip taxa
#' (via paleotree function \code{\link{createMrBayesTipCalibrations}}).
#' The user must also supply some topological constraint:
#' either a set of taxa designated as the outgroup, which
#' is then converted into a command constraining
#' the monophyly on the ingroup taxa, which is presumed to be
#' all taxa \emph{not} listed in the outgroup.
#' Alternatively, a user may supply a tree which is then
#' converted into a series of hard topological constraints
#' (via function \code{\link{createMrBayesConstraints}}.
#' Both types of topological constraints cannot be applied.
#'
#' Many of the options available with \code{\link{createMrBayesTipCalibrations}}
#' are available with this function, allowing users to choose between fixed
#' calibrations or uniform priors that approximate stratigraphic uncertainty.
#' In addition, the user may also supply a path to a text file
#' presumed to be a NEXUS file containing character
#' data formatted for use with \emph{MrBayes}.
#'
#' The taxa listed in \code{tipTimes} must match the taxa in
#' \code{treeConstraints}, if such is supplied. If supplied, the taxa in \code{outgroupTaxa}
#' must be contained within this same set of taxa. These all must have matches
#' in the set of taxa in \code{origNexusFile}, if provided and
#' if \code{parseOriginalNexus} is \code{TRUE}.
#'
#' Note that because the same set of taxa must be contained in all inputs,
#' relationships are constrained as 'hard' constraints, rather than 'partial' constraints,
#' which allows some taxa to float across a partially fixed topology.
#' See the documentation for \code{\link{createMrBayesConstraints}},
#' for more details.
#' @inheritParams createMrBayesTipCalibrations
#' @param outgroupTaxa A vector of type 'character', containing taxon names designating the outgroup.
#' All taxa not listed in the outgroup will be constrained to be a monophyletic ingroup, for sake of rooting
#' the resulting dated tree.
#' Either \code{treeConstraints} or \code{outgroupTaxa} must be defined, but \emph{not both}.
#' If the outgroup-ingroup split is not present on the supplied
#' \code{treeConstraints}, add that split to \code{treeConstraints} manually.
#' @param whichAppearance Which appearance date of the taxa should be used:
#' their \code{'first'} or their \code{'last'} appearance date? The default
#' option is to use the 'first' appearance date. Note that use of the last
#' appearance date means that tips will be constrained to occur before their
#' last occurrence, and thus could occur long after their first occurrence (!).
#' In addition, \code{createMrBayesTipDatingNexus} allows for two
#' options for this argument that are in addition to those offered by
#' \code{\link{createMrBayesTipCalibrations}}. Both of these options will duplicate
#' the taxa in the inputs multiple times, modifying their OTU labels, thus allowing
#' multiple occurrences of long-lived morphotaxa to be listed as multiple OTUs
#' arrayed across their stratigraphic duration. If
#' \code{whichAppearance = "firstLast"}, taxa will be duplicated so each taxon is
#' listed as occurring twice: once at their first appearance, and a second time at
#' their last appearance. Note that if a taxon first and last appears in the same interval,
#' and \code{ageCalibrationType = "uniformRange"}, then
#' the resulting posterior trees may place the OTU assigned to the last occurrence before the
#' first occurrence in temporal order (but the assignment, in that case, was entirely
#' arbitrary). When \code{whichAppearance = "rangeThrough"}, each taxon will be
#' duplicated into as many OTUs as each
#' interval that a taxon ranges through (in a \code{timeList} format, see other
#' \code{paleotree} functions), with the corresponding age uncertainties for those intervals.
#' If the input \code{tipTimes} is not a list of
#' \code{length = 2}, however, the function will
#' return an error under this option.
#' @param orderedChars Should be a vector of numbers, indicating which characters should have their
#' character-type in MrBayes changed to 'ordered'.
#' If \code{NULL}, the default, then all characters will be treated as essentially unordered.
#' No character ID should be listed that is higher than the number of characters in the matrix provided in
#' \code{origNexusFile}. If \code{origNexusFile} is not provided, while \code{orderedChars}
#' is defined, then an error will be returned.
#' @param origNexusFile Filename (possibly with path) as a character
#' string leading to a NEXUS text file, presumably containing a matrix
#' of character date formated for \emph{MrBayes}. If supplied
#' (it does not need to be supplied), the listed file is read as a text file, and
#' concatenated with the \emph{MrBayes} script produced by this function, so as to
#' reproduce the original NEXUS matrix for executing in MrBayes.
#' Note that the taxa in this NEXUS file are \emph{NOT} checked against the user
#' input \code{tipTimes} and \code{treeConstraints}, so it is up to the user to
#' ensure the taxa are the same across the three data sources.
#' @param parseOriginalNexus If \code{TRUE} (the default), the original NEXUS file is parsed and
#' the taxon names listed within in the matrix are compared against the other inputs
#' for matching (completely, across all inputs that include taxon names).
#' Thus, it is up to the user to ensure the same
#' taxa are found in all inputs. However, some NEXUS files may not parse correctly
#' (particularly if character data for taxa stretches across more than a single line in the matrix).
#' This may necessitate setting this argument to \code{FALSE}, which will instead do a straight scan
#' of the NEXUS matrix without parsing it, and without checking the taxon names against other outputs.
#' Some options for \code{whichAppearance} will not be available, however.
#' @param newFile Filename (possibly with path) as a character string
#' leading to a file which will be overwritten with the output tip age calibrations.
#' If \code{NULL}, tip calibration commands are output to the console.
#' @param createEmptyMorphMat If \code{origNexusFile} is not specified (implying there is no
#' pre-existing morphological character matrix for this dataset), then an 'empty' NEXUS-formatted matrix will be
#' appended to the set of \emph{MrBayes} commands if this command is \code{TRUE} (the default). This
#' 'empty' matrix will have each taxon in \code{tipTimes} coded for a single missing character
#' (i.e., '?'). This allows tip-dating analyses with hard topological constraints, and ages
#' determined entirely by the fossilized birth-death prior, with no impact from a
#' presupposed morphological clock (thus a 'clock-less analysis').
#' @param runName The name of the run, used for naming the log files and MCMC output files.
#' If not set, the name will be taken from the name given for outputting
#' the NEXUS script (\code{newFile}). If \code{newFile} is not given, and
#' \code{runName} is not set by the user, the default run name will be "new_run_paleotree".
#' @param doNotRun If \code{TRUE}, the commands that cause a script to automatically begin running in
#' \emph{MrBayes} will be left out. Useful for troubleshooting initial runs of scripts for non-fatal errors and
#' warnings (such as ignored constraints). Default for this argument is \code{FALSE}.
#' @param autoCloseMrB If \code{TRUE}, the MrBayes script created by this function will
#' 'autoclose', so that when an MCMC run finishes the specified number of generations,
#' it does not interactively check whether to continue the MCMC. This is often necessary
#' for batch analyses.
#' @param treeConstraints An object of class \code{phylo},
#' from which (if \code{treeConstraints} is supplied) the set topological constraints are derived, as
#' as described for argument \code{tree} for function \code{createMrBayesConstraints}.
#' Either \code{treeConstraints} or \code{outgroupTaxa} must be defined, but \emph{not both}.
#' If the outgroup-ingroup split is not present on the supplied \code{treeConstraints}, add that split to \code{treeConstraints} manually.
#' @param morphModel This argument can be used to switch between two end-member models of
#' morphological evolution in MrBayes, here named 'strong' and 'relaxed', for the 'strong assumptions'
#' and 'relaxed assumptions' models described by Bapst et al. (2018, Syst. Biol.).
#' The default is a model which makes very 'strong' assumptions about the process of morphological evolution,
#' while the 'relaxed' alternative allows for considerably more heterogeneity in the rate
#' of morphological evolution across characters, and in the forward and reverse transition
#' rates between states. Also see argument \code{morphFiltered}.
#' @param morphFiltered This argument controls what type of filtering the input
#' morphological data is assumed to have been collected under. The likelihood of
#' the character data will be modified to take into account the apparent filtering
#' (Lewis, 2001; Allman et al., 2010). The default value, \code{"parsInf"}, forces
#' characters to be treated as if they were collected as part of a parsimony-based
#' study, with constant characters and autapomorphies (characters that only differ
#' in state in a single taxon unit) ignored or otherwise filtered out, and any such
#' characters in the presented matrix will be ignored. \code{morphFiltered = "variable"}
#' assumes that while constant characters are still filtered out (e.g. it is
#' difficult or impossible to count the number of morphological characters that
#' show no variation across a group), the autapomorphies were intentionally collected
#' and included in the presented matrix. Thus, constant characters in the included
#' matrix will be ignored, but autapomorphies will be considered.
#' @param cleanNames If \code{TRUE} (the default), then special characters
#' (currently, this only contains the forward-slashes: '/') are removed from
#' taxon names before construction of the NEXUS file.
#' @param printExecute If \code{TRUE} (the default) and if output is directed to a \code{newFile}
#' (i.e. a \code{newFile} is specified), a line for pasting into MrBayes for executing the newly created file
#' will be messaged to the terminal.
#' @param ngen Number of generations to set the MCMCMC to run for.
#' Default (\code{ngen = 100000000}) is very high.
#' @note
#' This function allows a user to take an undated phylogenetic tree in R, and a set of age estimates
#' for the taxa on that tree, and produce a posterior sample of dated trees using the MCMCMC in \emph{MrBayes},
#' while treating an 'empty' morphological matrix as an uninformative set of missing characters.
#' This 'clock-less tip-dating' approach is essentially an alternative to the \emph{cal3} method in paleotree,
#' sharing the same fundamental theoretical model (a version of the fossilized birth-death model), but
#' with a better algorithm that considers the whole tree simultaneously, rather than evaluating each node
#' individually, from the root up to the tips (as \emph{cal3} does it, and which may cause artifacts).
#' That said, \emph{cal3} still has a few advantages: tip-dating as of April 2017 still only treats OTUs as
#' point observations, contained in a single time-point, while cal3 can consider taxa as having durations with
#' first and last occurrences. This means it may be more straightforward to assess the extent of budding cladogenesis
#' patterns of ancestor-descendant relationships in \emph{cal3}, than in tip-dating.
#' @return
#' If argument \code{newFile} is \code{NULL}, then the text of the
#' generated NEXUS script is output to the console as a series of character strings.
#' @seealso
#' This function wraps various aspects of the functions \code{\link{createMrBayesConstraints}}
#' and the function \code{\link{createMrBayesTipCalibrations}}. In many ways, this functionality is a
#' replacement for the probabilistic dating method represented by the \code{\link{cal3}} dating functions.
#'
#' For putting the posterior estimated trees on an absolute time scale, see
#' functions \code{\link{obtainDatedPosteriorTreesMrB}}. Use the argument \code{getFixedTimes = TRUE}
#' if you used a taxon with a fixed age, and function \code{\link{setRootAges}} to set the root age.
#'
#' @author
#' David W. Bapst. This code was produced as part of a project
#' funded by National Science Foundation grant EAR-1147537 to S. J. Carlson.
#'
#' The basic \emph{MrBayes} commands utilized in the output script are a collection
#' of best practices taken from studying NEXUS files supplied by April Wright,
#' William Gearty, Graham Slater, Davey Wright, and
#' guided by the recommendations of Matzke and Wright, 2016 in Biology Letters.
#' @references
#' The basic fundamentals of tip-dating, and tip-dating with the fossilized
#' birth-death model are introduced in these two papers:
#'
#' Ronquist, F., S. Klopfstein, L. Vilhelmsen, S. Schulmeister, D. L. Murray,
#' and A. P. Rasnitsyn. 2012. A Total-Evidence Approach to Dating with Fossils,
#' Applied to the Early Radiation of the Hymenoptera. \emph{Systematic Biology} 61(6):973-999.
#'
#' Zhang, C., T. Stadler, S. Klopfstein, T. A. Heath, and F. Ronquist. 2016.
#' Total-Evidence Dating under the Fossilized Birth-Death Process.
#' \emph{Systematic Biology} 65(2):228-249.
#'
#' For recommended best practices in tip-dating analyses, please see:
#'
#' Matzke, N. J., and A. Wright. 2016. Inferring node dates from tip dates
#' in fossil Canidae: the importance of tree priors. \emph{Biology Letters} 12(8).
#'
#' The rationale behind the two alternative morphological models are described in more detail here:
#'
#' Bapst, D. W., H. A. Schreiber, and S. J. Carlson. 2018.
#' Combined Analysis of Extant Rhynchonellida (Brachiopoda) using Morphological
#' and Molecular Data. \emph{Systematic Biology} 67(1):32-48.
#
#' @examples
#'
#' # load retiolitid dataset
#' data(retiolitinae)
#'
#' # let's try making a NEXUS file!
#'
#' # Use a uniform prior, with a 10 million year offset for
#' # the expected tree age from the earliest first appearance
#'
#' # Also set average tree age to be 10 Ma earlier than first FAD
#'
#' outgroupRetio <- "Rotaretiolites"
#' # this taxon will now be sister to all other included taxa
#'
#' # the following will create a NEXUS file
#' # with an 'empty' morph matrix
#' # where the only topological constraint is on ingroup monophyly
#' # Probably shouldn't do this: leaves too much to the FBD prior
#'
#' # with doNotRun set to TRUE for troubleshooting
#'
#' createMrBayesTipDatingNexus(
#' tipTimes = retioRanges,
#' outgroupTaxa = outgroupRetio,
#' treeConstraints = NULL,
#' ageCalibrationType = "uniformRange",
#' whichAppearance = "first",
#' treeAgeOffset = 10,
#' newFile = NULL,
#' origNexusFile = NULL,
#' createEmptyMorphMat = TRUE,
#' runName = "retio_dating",
#' doNotRun = TRUE
#' )
#'
#' # let's try it with a tree for topological constraints
#' # this requires setting outgroupTaxa to NULL
#' # let's also set doNotRun to FALSE
#'
#' createMrBayesTipDatingNexus(
#' tipTimes = retioRanges,
#' outgroupTaxa = NULL,
#' treeConstraints = retioTree,
#' ageCalibrationType = "uniformRange",
#' whichAppearance = "first",
#' treeAgeOffset = 10,
#' newFile = NULL,
#' origNexusFile = NULL,
#' createEmptyMorphMat = TRUE,
#' runName = "retio_dating",
#' doNotRun = FALSE
#' )
#'
#' # the above is essentially cal3 with a better algorithm,
#' # and no need for a priori rate estimates
#' # just need a tree and age estimates for the tips!
#'
#' ####################################################
#' # some more variations for testing purposes
#'
#' # no morph matrix supplied or generated
#' # you'll need to manually append to an existing NEXUS file
#'
#' createMrBayesTipDatingNexus(
#' tipTimes = retioRanges,
#' outgroupTaxa = NULL,
#' treeConstraints = retioTree,
#' ageCalibrationType = "uniformRange",
#' whichAppearance = "first",
#' treeAgeOffset = 10,
#' newFile = NULL,
#' origNexusFile = NULL,
#' createEmptyMorphMat = FALSE,
#' runName = "retio_dating",
#' doNotRun = TRUE
#' )
#'
#' \dontrun{
#'
#' # let's actually try writing an example with topological constraints
#' # to file and see what happens
#'
#' # here's my super secret MrBayes directory
#' file <- "D:\\dave\\workspace\\mrbayes\\exampleRetio.nex"
#'
#' createMrBayesTipDatingNexus(
#' tipTimes = retioRanges,
#' outgroupTaxa = NULL,
#' treeConstraints = retioTree,
#' ageCalibrationType = "uniformRange",
#' whichAppearance = "first",
#' treeAgeOffset = 10,
#' newFile = file,
#' origNexusFile = NULL,
#' createEmptyMorphMat = TRUE,
#' runName = "retio_dating",
#' doNotRun = FALSE
#' )
#'
#' }
#'
## TO DO !!
# function for counting number of sampled ancestors, paraphyletic ancestors (budding)
#' @aliases tipdating
#' @name createMrBayesTipDatingNexus
#' @rdname createMrBayesTipDatingNexus
#' @export
createMrBayesTipDatingNexus <- function(
tipTimes,outgroupTaxa = NULL,
treeConstraints = NULL,
ageCalibrationType,
whichAppearance = "first",
treeAgeOffset,
minTreeAge = NULL,
collapseUniform = TRUE,
anchorTaxon = TRUE,
newFile = NULL,
origNexusFile = NULL,
parseOriginalNexus = TRUE,
createEmptyMorphMat = TRUE,
orderedChars=NULL,
morphModel = "strong",
morphFiltered = "parsInf",
runName = NULL,
ngen = "100000000",
doNotRun = FALSE,
autoCloseMrB = FALSE,
cleanNames = TRUE,
printExecute = TRUE
){
################################################################################################
#
# # a wooper of a function ... here's some ASCII from artist 'Psyduck'
#
# = @@@@@@
# = @@.......@@@
# = @.. ........@@ = @
# = @... ..........@ = @ @
# @ = @ @.... ............@ = @@ @
# @ = @ = .....................@ = @ @ == @
# @@ = @ @...@.........@.......@ = @ == @@
# @ == == @@.......................@ == = @@ =
# @@@ == @. = .............. = ....@ == = @@ @@
# @@ @@@ = .@..............@....@ = @@ = @ @
# @ @@ = ..@ = ..........@ = ......@@ @ = @
# @ @.... = @@@@@@@@ = .........@ @ = @
# @@ @.......................@ @ =
# @ @.....................@
# = .....................@
# @...................@
# @.................@
# @@.............@@
# @@ = ....... = @@
# @@.......@ @@@
# @. == == = ...@ @@ @
# @... == = .....@ @ @
# @...........@ @. @
# @.. == ... == ....@ @@. . @
# @... == == = .....@@.... . . @
# @.............@...... . .@
# @. == ... == ....@.........@
# @. == == = ....@ = @@.....@@
# = @@ = @......@@ @@@@@
# = ... == @@@@@@ == @@
# @....@ = ....@
# @@@@ @....@
# @@@@
#
#################################################################################################
# make sure tipTimes is not a data.frame
if(is.data.frame(tipTimes)){
tipTimes <- as.matrix(tipTimes)
}
if(is.list(tipTimes)){
if(length(tipTimes) == 2){
tipTimes[[1]] <- as.matrix(tipTimes[[1]])
tipTimes[[2]] <- as.matrix(tipTimes[[2]])
}else{
stop("why is tipTimes a list of not length 2?")
}
}
##################################################################################################
# -Need to check that ingroup constraint isn't on treeConstraints, and if so, delete it
# actually maybe just make it so no ingroup constraint is defined if treeConstraint is defined
# - presumably its already part of provided tree!!
# no group on the tree can replicate the ingroup constraint - repeated topology constraints not allowed
# REALLY REALLY IMPORTANT - no ingroup should be referenced if a tree is given
# outgroupTaxa and treeConstraints - one and only one must be defined
if(is.null(outgroupTaxa) & is.null(treeConstraints)){
stop("Either outgroupTaxa or treeConstraints must be provided for a tip-dating analysis")
}
if(!is.null(outgroupTaxa) & !is.null(treeConstraints)){
stop(
"Only one of outgroupTaxa or treeConstraints can be provided.
If the ingroup monophyly is not enforced on the provided treeConstraints,
please add this split to treeConstraints"
)
}
####################################################################################
# make morphNexus as NULL so can test later that its been replaced
morphNexus <- NULL
##################################################################################
# provide new whichAppearance options: 'firstLast', 'rangeThrough'
# rangeThrough will require checking tipTimes for sequential intervals
if(any(whichAppearance == c("firstLast", "rangeThrough"))){
multOTU <- TRUE
if(is.null(origNexusFile)){
# stop('"A NEXUS file must be supplied if whichAppearance is "firstLast" or "rangeThrough"')
}else{
if(parseOriginalNexus){
nexusData <- parseNexusFile(
origNexusFile = origNexusFile,
asIs = FALSE
)
remakeDataBlockFun <- nexusData$remakeDataBlockFun
morphNexus <- nexusData$morphNexusAsIs
}else{
stop("Cannot use an option for multiple OTUs if you don't parse your NEXUS file")
}
}
}else{
# if not multOTU
multOTU <- FALSE
if(!is.null(origNexusFile)){
if(parseOriginalNexus){
nexusData <- parseNexusFile(
origNexusFile = origNexusFile,
asIs = FALSE
)
morphNexus <- nexusData$morphNexusAsIs
}else{
nexusData <- parseNexusFile(
origNexusFile = origNexusFile,
asIs = TRUE
)
morphNexus <- nexusData
}
}
}
if(is.null(origNexusFile) & !is.null(orderedChars)){
stop("Morphological data in an original matrix must be provided if orderedChars is used")
}
if(is.null(origNexusFile)){
nexusData <- NULL
}
# CHECK rangethrough option
if(whichAppearance == "rangeThrough"){
# If the input tipTimes is not a list of length = 2, however, the function will
# return an error under this option.
# also require checking for sequential intervals
if(!isTimeListSequential(tipTimes)){
stop("tipTimes must be a sequential timeList format")
}
}
####################################################################################
# CHECK TAXON NAMES
# taxa in tipTimes is king
# all taxa in input treeConstraints must be in tipTimes (and vice versa)
# origNexusFile *is* checked
if(is.list(tipTimes)){
taxaTipTimes <- rownames(tipTimes[[2]])
}else{
taxaTipTimes <- rownames(tipTimes)
}
if(!is.null(treeConstraints)){
if(!is(treeConstraints,"phylo")){
stop("treeConstraints must be a phylogeny object of type 'phylo'")
}
taxaTree <- treeConstraints$tip.label
# check both
missingTip <- taxaTipTimes[sapply(taxaTipTimes,function(x)
all(x != taxaTree))]
missingTree <- taxaTree[sapply(taxaTree,function(x)
all(x != taxaTipTimes))]
# stop if length>0
if(length(c(missingTip, missingTree))>0){
stop(paste0(
"Following taxa in tipTimes not found on treeConstraints: ",
paste0(missingTip, collapse = " "),
"\nFollowing taxa on treeConstraints not found in tipTimes: ",
paste0(missingTree, collapse = " ")))
}
if(length(taxaTree) != length(taxaTipTimes)){
stop("Somehow have taxa missing from either tipTimes or treeConstraints")
}
if(!identical(sort(taxaTipTimes),sort(taxaTree))){
stop("Nope, taxa in tipTimes and on treeConstraints STILL not identical!!")
}
}
#
# test if all outgroupTaxa are in the tip age taxon names
if(!is.null(outgroupTaxa)){
missingOutgroup <- outgroupTaxa[
sapply(outgroupTaxa,function(x)
all(x != taxaTipTimes)
)
]
if(length(missingOutgroup)>0){
stop(paste0("Following outgroup taxa have no match in tipTimes: ",
paste0(missingOutgroup, collapse = " ")
))
}
}
#
# test if consistent with taxa from origNexusFile, if parsed
if(parseOriginalNexus & !is.null(origNexusFile)){
nexusTaxa <- nexusData$taxonNames
missingTip <- taxaTipTimes[sapply(taxaTipTimes,function(x)
all(x != nexusTaxa))]
missingNexus <- nexusTaxa[sapply(nexusTaxa,function(x)
all(x != taxaTipTimes))]
if(length(c(missingTip, missingNexus))>0){
stop(paste0("Following taxa in tipTimes not found in NEXUS file: ",
paste0(missingTip, collapse = " "),
"\nFollowing taxa in NEXUS file not found in tipTimes: ",
paste0(missingNexus, collapse = " ")
))
}
if(length(nexusTaxa) != length(taxaTipTimes)){
stop("Somehow have taxa missing from either tipTimes or the original NEXUS file")
}
if(!identical(sort(taxaTipTimes),sort(nexusTaxa))){
stop("Nope, taxa in tipTimes and in original NEXUS file STILL not identical!!")
}
}
#################################################################
# check other arguments
if(length(morphModel) != 1){
stop("morphModel must be of length 1")
}
if(all(morphModel != c("relaxed","strong"))){
stop("morphModel must be one 'relaxed' or 'strong'")
}
if(morphModel == "relaxed" & is.null(origNexusFile)){
warning(paste0(
"Why are you relaxing the morphological model without supplying an original matrix with origNexusFile?",
"\n I hope you know what you are doing...\n"
))
}
#
if(!is.character(ngen)){
stop("ngen must be type character; it turns out R is terrible at converting large numbers to text")
}
#
# note: origNexusFile might be a connection - cannot test for length 1
#
if(length(createEmptyMorphMat) != 1 | !is.logical(createEmptyMorphMat)){
stop("createEmpthyMorphMat must be of length 1, and either TRUE or FALSE")
}
##################################################
# check and get orderedChars line
if(!is.null(orderedChars)){
if(is.numeric(orderedChars)){
if(any(orderedChars<1)){
"orderedChars should only have positive numbers greater than 0"
}
orderedChars<-paste0(orderedChars,collapse=" ")
if(!is.character(orderedChars) | length(orderedChars)!=1){
stop("orderedChars cannot be coerced to a string of length = 1")
}
}else{
if(!is.character(orderedChars) | length(orderedChars)!=1){
stop("orderedChars is not a numeric vector or a string of length = 1")
}
}
}
###################################################
# cleaning taxon names
cleanTaxonNames <- taxaTipTimes
if(length(unique(cleanTaxonNames)) != length(cleanTaxonNames)){
stop("Some taxon names were identical duplicates")
}
if(cleanNames){
# create 'clean' version of taxon names - remove all '/'
cleanTaxonNames <- gsub("/","",cleanTaxonNames)
# check that unique
if(length(unique(cleanTaxonNames)) != length(cleanTaxonNames)){
stop("Some taxon names were identical duplicates when special character (/) was removed")
}
# removes every character except letters and numbers - only use if dire (i.e. backslashes)
# gsub("[^A-Za-z0-9]", "", a)
#
if(is.list(tipTimes)){
rownames(tipTimes[[2]]) <- gsub("/","",
rownames(tipTimes[[2]]))
}else{
rownames(tipTimes) <- gsub("/","",
rownames(tipTimes))
}
#
if(!is.null(outgroupTaxa)){
outgroupTaxa <- gsub("/","",
outgroupTaxa)
}
#
if(!is.null(treeConstraints)){
treeConstraints$tip.label <- gsub("/","",
treeConstraints$tip.label)
}
#
if(parseOriginalNexus & !is.null(origNexusFile)){
nexusTaxa <- gsub("/","",nexusTaxa)
}
#
}
###################################################################
# get final taxon name list
if(multOTU){
if(whichAppearance == "rangeThrough"){
if(!is.list(tipTimes)){
stop("tipTimes must be a timeList object if 'whichAppearance = rangeThrough'")
}
# for each taxon in tipTimes, figure out intervals they range through
# and then multiply this taxon in the tip data,
# the tree/root constraints and NEXUS data block
#
newOTU <- matrix(,1,4)
for(i in 1:nrow(tipTimes[[2]])){
# count number of range-through intervals, get dates and new names
origName <- rownames(tipTimes[[2]])[i]
# raw interval IDs assuming sequential timeList
rawIntervals <- tipTimes[[2]][i,1]:tipTimes[[2]][i,2]
intervalMatrix <- tipTimes[[1]][rawIntervals,,drop = FALSE]
# get new names, using interval names
newNames <- paste0(origName,"_",rownames(intervalMatrix))
newOTU <- rbind(newOTU,cbind(newNames,origName,intervalMatrix))
}
newOTU <- newOTU[-1,]
}
if(whichAppearance == "firstLast"){
# okay for almost all inputs
#
newOTU <- matrix(,1,4)
if(!is.list(tipTimes)){
if(ncol(tipTimes)!=4){
stop(paste0(
"tipTimes if not a timeList object must have four columns,\n",
" indicating uncertainty bounds on FADs and LADS separately"
))
}
for(i in 1:nrow(tipTimes)){
# count number of range-through intervals, get dates and new names
origName <- rownames(tipTimes)[i]
# raw interval IDs for FAD and LAD
intervalMatrix <- rbind(
tipTimes[i,1:2],tipTimes[i,3:4]
)
# get new names, using interval names
newNames <- paste0(origName,c("_Fint","_Lint"))
newOTU <- rbind(newOTU,cbind(newNames,origName,intervalMatrix))
}
}else{
for(i in 1:nrow(tipTimes[[2]])){
# count number of range-through intervals, get dates and new names
origName <- rownames(tipTimes[[2]])[i]
# raw interval IDs for FAD and LAD
rawIntervals <- tipTimes[[2]][i,]
intervalMatrix <- tipTimes[[1]][rawIntervals,,drop = FALSE]
# get new names, using interval names
newNames <- paste0(origName,c("_Fint","_Lint"))
newOTU <- rbind(
newOTU,
cbind(newNames, origName, intervalMatrix)
)
}
}
newOTU <- newOTU[-1,]
}
# create new tipTimes that is two date uncertainties
tipTimes <- newOTU[,3:4]
rownames(tipTimes) <- newOTU[,1]
# create new tree constraints, if such exists
# replace original tip with multiple taxa, collapse so not monophyletic
if(!is.null(treeConstraints)){
treeTaxaExpand <- newOTU[,2]
names(treeTaxaExpand) <- newOTU[,1]
treeConstraints <- expandTaxonTree(taxonTree = treeConstraints,
taxaData = treeTaxaExpand ,collapse = newOTU[,2])
}
# fix outgroupTaxa, if exists
if(!is.null(outgroupTaxa)){
newOutgroupOTU <- sapply(newOTU[,2],function(x) any(x == outgroupTaxa))
outgroupTaxa <- newOTU[newOutgroupOTU,1]
}
# fix NEXUS matrix
if(parseOriginalNexus & !is.null(origNexusFile)){
# given data on new taxa (with old taxa), rebuild NEXUS block
# input: a matrix with column 1 = new taxon names
# column 2 = old taxon names
# need to tell it that the taxon names have change (possibly) due to cleaning
morphNexus <- remakeDataBlockFun(newTaxaTable = newOTU[,1:2],
taxonNames = cleanTaxonNames)
}
taxonnames <- newOTU[,1]
# change whichAppearance
whichAppearance <- "first"
}else{
taxonnames <- cleanTaxonNames
}
#####################################################################
# create an empty morph matrix if necessary
if(is.null(origNexusFile)){
if(createEmptyMorphMat){
morphNexus <- makeEmptyMorphNexusMrB(taxonNames = taxonnames)
}else{
morphNexus <- " "
}
}
########################################
# check that morphNexus has been replaced
if(is.null(morphNexus)){
stop(paste0(
"A morphological nexus was neither provided nor created.\n",
" Please contact dwbapst@gmail.com about this bug."
))
}
#}else{
# morphNexus <- nexusFileText
# }
########################################################
# get runName if not supplied
if(is.null(runName)){
if(is.null(newFile)){
#stop("runName must be supplied if name of new file is not designated")
runName <- "new_run_paleotree"
}
# get run name - everything after the last / or \\
runName <- rev(strsplit(newFile,split = "\\\\")[[1]])[1]
runName <- rev(strsplit(runName,split = "/")[[1]])[1]
}
# use run name as log file name
logfileline <- paste0(
'log start filename = "',
runName,
'.out" replace;'
)
# use run name for MCMC output files
outputNameLine <- paste0('Filename = "',runName,'"')
########################################################
#
if(is.null(treeConstraints)){
# get the ingroup constraint, as there is no tree
ingroupConstraint <- makeIngroupConstraintMrB(
outgroupTaxa = outgroupTaxa,
allTaxa = taxonnames)
topologicalConstraints <- " "
}else{
# get topological constraints, if given in input, do not set ingroup
ingroupConstraint <- " "
topologicalConstraints <- createMrBayesConstraints(
tree = treeConstraints,
partial = FALSE,
file = NULL,
includeIngroupConstraint = FALSE
)
}
#
# get age calibration block
ageCalibrations <- createMrBayesTipCalibrations(
tipTimes = tipTimes,
ageCalibrationType = ageCalibrationType,
whichAppearance = whichAppearance,
treeAgeOffset = treeAgeOffset,
minTreeAge = minTreeAge,
collapseUniform = collapseUniform,
anchorTaxon = anchorTaxon,
file = NULL
)
#
# make the final MrBayes Block
MrBayesBlock <- makeMrBayesBlock(
logBlock = logfileline,
ingroupBlock = ingroupConstraint,
ageBlock = ageCalibrations,
constraintBlock = topologicalConstraints,
orderedChars=orderedChars,
autoCloseMrB = autoCloseMrB,
morphModel = morphModel,
morphFiltered = morphFiltered,
ngen = ngen,
doNotRun = doNotRun,
outputNameLine = outputNameLine
)
###############################################
# combine morph matrix block with MrBayes command block
finalText <- c(morphNexus, MrBayesBlock)
if(!is.null(newFile)){
write(x = finalText, file = newFile)
if(printExecute){
# have function print command for pasting into MrBayes to execute:
# e.g. ' Execute "C://fossil data/myNexus.nex" '
message("Now go to MrBayes and paste in this line: ")
message(paste0('Execute "',normalizePath(newFile),'";'))
}
}else{
return(finalText)
}
}
#################################################################################
# Supplemental functions
###################################################################################
makeIngroupConstraintMrB <- function(outgroupTaxa,allTaxa){
# use the list of outgroup taxa to define the in-group taxa
# write a MrBayes line that defines all ingroup taxa
# as being monophyletic
################################################
# checks
if(length(outgroupTaxa)<1){
stop("outgroupTaxa must be of length 1 or greater")
}
if(!is.character(outgroupTaxa) | !is.vector(outgroupTaxa)){
stop("outgroupTaxa must be a vector of type character")
}
if(!is.character(allTaxa) | !is.vector(allTaxa)){
stop("taxon labels cannot be turned into a vector of strings")
}
# are there any outgroup not in all?
notPresent <- sapply(outgroupTaxa,function(x) all(x != allTaxa))
if(any(notPresent)){
stop(paste("Following outgroup taxa not present in input tipTimes",
paste(outgroupTaxa[notPresent],collapse = " ")))
}
#############################################
# get the ingroup taxa
inGroup <- allTaxa[sapply(allTaxa,function(x) all(x != outgroupTaxa))]
# check that there are SOME ingroup
if(length(inGroup)<2){
stop("Less than two taxa are in the ingroup??")
}
# make ingroup block
ingroupBlock <- paste0("constraint ingroup = ",
paste(inGroup,collapse = " "),";")
ingroupBlock <- c(ingroupBlock,
"prset topologypr = constraints(ingroup);")
return(ingroupBlock)
}
makeEmptyMorphNexusMrB <- function(taxonNames){
#given a vector of taxonNames, make NEXUS character block
# formatted for MrBayes
# each character is a single ?
# check
if(!is.character(taxonNames) | !is.vector(taxonNames)){
stop("taxon names not formatted correctly for making empty NEXUS matrix")
}
################
# get standard header and
nTaxa <- length(taxonNames)
if(nTaxa<1){stop("vector of taxon names is zero-length")}
# build header
headMat <- c("#NEXUS","BEGIN DATA;",
paste0("DIMENSIONS NTAX = ",nTaxa," NCHAR = 1;"),
"FORMAT DATATYPE = STANDARD GAP = - MISSING = ?;",
"MATRIX")
endMat <- c("",";","END;")
###############################
# build body
bodyMat <- sapply(taxonNames,function(x)
paste0(x," ?")
)
#########################
finalMatBlock <- c(headMat,bodyMat,endMat)
names(finalMatBlock) <- NULL
return(finalMatBlock)
}
##########################################################################################################################################
makeMrBayesBlock <- function(logBlock, ingroupBlock,
ageBlock, orderedChars, autoCloseMrB,
constraintBlock, morphModel = "strong",
morphFiltered="parsInf",
ngen = 100000000, doNotRun = FALSE,
outputNameLine = outputNameLine){
#########################################################################################
###
### # what follows will look very messy due to its untabbed nature
### # I'm so sorry - dwb
###
##################################################
block1 <- "
begin mrbayes;
[This block is a combination of best practices taken from NEXUS files from
April Wright, William Gearty, Graham Slater, Davey Wright,
and guided by the recommendations of Matzke and Wright, 2016, Biol. Lett.]
[autoclose, I like it to ask (the default), but you might want it - if so, set to 'yes']"
if(autoCloseMrB){
block1 <- c(block1,"set autoclose = yes;\n")
}else{
block1 <- c(block1,"[set autoclose = yes;]\n")
}
##########################################################################
block2 <- "
[DATA]
[edits to taxon or character selection go here]
[CONSTRAIN MONOPHYLY OF INGROUP]
[constrain all members of the ingroup to be monophyletic]
"
#######################################################################
block3 <- "
[TOPOLOGICAL CONSTRAINTS FOR ADDITIONAL NODES]
[[EXAMPLE]
constraint node1 = t1 t2 t3;
constraint node2 = t4 t5;
prset topologypr = constraints(ingroup,node1,node2); [need to include ingroup]]
"
############################################################################
if(!is.null(orderedChars)){
# example:
# ctype ordered: 1 15 20;
orderedChars<-c(
" ",
"[ordered characters block]",
paste0("ctype ordered: ",orderedChars,";"),
""
)
}else{
orderedChars<-""
}
##############################################################################
if(morphFiltered == "parsInf"){
filteredType <- "informative"
}
if(morphFiltered == "variable"){
filteredType <- "variable"
}
if(is.null(filteredType)){
stop("morphFiltered must be one of 'parsInf' or 'variable'")
}
morphModelBlock_Strong <- paste0("
[CHARACTER MODELS]
[morphology model settings]
[make the number of beta categories for stationary frequencies 4 (the default)]
[default: use pars-informative coding]
[set coding and rates - default below maximizes information content]
lset nbetacat = 5 rates = equal Coding = ",
filteredType,
"; [equal rate variation]
[lset nbetacat = 5 rates = gamma Coding = informative; [gamma distributed rate variation]]
[gamma distributed rates may cause divide by zero problems with non-fixed symdiri]
[symdirhyperpr prior, fixed vs. variable]
prset symdirihyperpr = fixed(infinity);
[prset symdirihyperpr = uniform(1,10); [this range seems to avoid divide by zero error]]
")
morphModelBlock_Relaxed <- paste0("
[CHARACTER MODELS]
[morphology model settings]
[make the number of beta categories for stationary frequencies 4 (the default)]
[default: use pars-informative coding]
[set coding and rates - default below maximizes information content]
[lset nbetacat = 5 rates = equal Coding = ",
filteredType,
"; [equal rate variation]]
lset nbetacat = 5 rates = gamma Coding = informative; [gamma distributed rate variation]
[gamma distributed rates may cause divide by zero problems with non-fixed symdiri]
[symdirhyperpr prior, fixed vs. variable]
[prset symdirihyperpr = fixed(infinity);]
prset symdirihyperpr = uniform(1,10); [this range seems to avoid divide by zero error]
")
#############################################################################
block4 <- "
[TIP AND TREE AGE CALIBRATIONS]
[[EXAMPLE WITH BOUNDS: min age, max age]
calibrate t1 = uniform(105, 113);
calibrate t2 = uniform(97, 115);
[EXAMPLE WITH FIXED AGE]
calibrate t3 = fixed (100);]
[prior on tree age]
[with offset exponential in MrB, first par is min age, second is expected mean age]
[mean date must be greater than min date - duh]
[EXAMPLE]
[prset treeagepr = offsetexp(115,120); ]
"
##########################################################################
block5a <- "
[FOSSILIZED BIRTH DEATH MODEL & ASSOCIATED PARAMETERS]
prset brlenspr = clock:fossilization;
prset fossilizationpr = beta(1,1); [flat, sampling is psi/(mu+psi), 0-1 ]
prset speciationpr = uniform(0,10);
prset extinctionpr = beta(1,1); [flat, extinction is relative to speciation between 0-1]
prset samplestrat = random; [default: bdss prior ]
[prset samplestrat = fossiltip; [this would mean no sampled ancestors]]
prset sampleprob = 1; [rho, sampling in the present, default fixed to 1]
[Unclear what to set this to if all taxa are extinct...]
[CLOCK PRIORS]
[clock rate: truncated normal, mean = 0.08, but very flat; see Supp Table 2 of Matzke & Wright 2016]
prset clockratepr = normal(0.0025,0.1);
prset clockvarpr = igr;
prset igrvarpr = uniform(0.0001, 200); [vague prior that is actually vague; Matzke & Wright 2016]
prset nodeagepr = calibrated; [set node age priors]
[SETTINGS FOR PARTITIONED ANALYSES]
[Necessary if a combined analysis, I guess]
[RUN SETTINGS]
[FYI burnin settings for one command sets burnin settings for all]
mcmcp ngen = "
block5b <- " relburnin = yes burninfrac = 0.5 printfreq = 1000 samplefreq = 1000 nchains = 4 nruns = 2 savebrlens = yes;
set seed = 4;
"
###################################################################
runBlock <- "
[RUN]
mcmc;
sump;
[Given this is a dating analysis, should really use MCCT,not summary trees]
[I guess best summary tree though is allcompat?]
[sumt contype = allcompat; ] [allcompat has issues with sampled ancestors]
log stop;
"
######################################################
if(doNotRun){
runBlock <- " "
}
if(morphModel == "strong"){
morphModelBlock <- morphModelBlock_Strong
}
if(morphModel == "relaxed"){
morphModelBlock <- morphModelBlock_Relaxed
}
# insert ngen
block5 <- paste0(block5a,ngen," ",outputNameLine,block5b)
####################
finalBlock <- c(
block1,
logBlock,
block2,
ingroupBlock,
block3,
constraintBlock,
orderedChars,
morphModelBlock,
block4,
ageBlock,
block5,
runBlock,
" ",
"end;")
return(finalBlock)
}
###################################################################################################
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.