MaximizeParsimony | R Documentation |
Search for most parsimonious trees using the parsimony ratchet and TBR rearrangements, treating inapplicable data as such using the algorithm of \insertCiteBrazeau2019;textualTreeSearch.
Tree search will be conducted from a specified or automatically-generated starting tree in order to find a tree with an optimal parsimony score, under implied or equal weights, treating inapplicable characters as such in order to avoid the artefacts of the standard Fitch algorithm \insertCite@see @Maddison1993; @Brazeau2019TreeSearch. Tree length is calculated using the MorphyLib C library \insertCiteBrazeau2017TreeSearch.
MaximizeParsimony(
dataset,
tree,
ratchIter = 7L,
tbrIter = 2L,
startIter = 2L,
finalIter = 1L,
maxHits = NTip(dataset) * 1.8,
maxTime = 60,
quickHits = 1/3,
concavity = Inf,
ratchEW = TRUE,
tolerance = sqrt(.Machine[["double.eps"]]),
constraint,
verbosity = 3L
)
Resample(
dataset,
tree,
method = "jack",
proportion = 2/3,
ratchIter = 1L,
tbrIter = 8L,
finalIter = 3L,
maxHits = 12L,
concavity = Inf,
tolerance = sqrt(.Machine[["double.eps"]]),
constraint,
verbosity = 2L,
...
)
EasyTrees()
EasyTreesy()
dataset |
A phylogenetic data matrix of phangorn class
|
tree |
(optional) A bifurcating tree of class |
ratchIter |
Numeric specifying number of iterations of the parsimony ratchet \insertCiteNixon1999TreeSearch to conduct. |
tbrIter |
Numeric specifying the maximum number of TBR break points on a given tree to evaluate before terminating the search. One "iteration" comprises selecting a branch to break, and evaluating each possible reconnection point in turn until a new tree improves the score. If a better score is found, then the counter is reset to zero, and tree search continues from the improved tree. |
startIter |
Numeric: an initial round of tree search with
|
finalIter |
Numeric: a final round of tree search will evaluate
|
maxHits |
Numeric specifying the maximum times that an optimal parsimony score may be hit before concluding a ratchet iteration or final search concluded. |
maxTime |
Numeric: after |
quickHits |
Numeric: iterations on subsampled datasets
will retain |
concavity |
Determines the degree to which extra steps beyond the first
are penalized. Specify a numeric value to use implied weighting
\insertCiteGoloboff1993TreeSearch; |
ratchEW |
Logical specifying whether to use equal weighting during ratchet iterations, improving search speed whilst still facilitating escape from local optima. |
tolerance |
Numeric specifying degree of suboptimality to tolerate
before rejecting a tree. The default, |
constraint |
Either an object of class |
verbosity |
Integer specifying level of messaging; higher values give
more detailed commentary on search progress. Set to |
method |
Unambiguous abbreviation of |
proportion |
Numeric between 0 and 1 specifying what proportion of characters to retain under jackknife resampling. |
... |
Additional parameters to |
Tree search commences with ratchIter
iterations of the parsimony ratchet
\insertCiteNixon1999TreeSearch, which bootstraps the input dataset
in order to escape local optima.
A final round of tree bisection and reconnection (TBR)
is conducted to broaden the sampling of trees.
This function can be called using the R command line / terminal, or through
the "shiny" graphical user interface app (type EasyTrees()
to launch).
The optimal strategy for tree search depends in part on how close to optimal the starting tree is, the size of the search space (which increases super-exponentially with the number of leaves), and the complexity of the search space (e.g. the existence of multiple local optima).
One possible approach is to employ four phases:
Rapid search for local optimum: tree score is typically easy to improve
early in a search, because the initial tree is often far from optimal.
When many moves are likely to be accepted, running several rounds of search
with a low value of maxHits
and a high value of tbrIter
allows many
trees to be evaluated quickly, hopefully moving quickly to a more promising
region of tree space.
Identification of local optimum:
Once close to a local optimum, a more extensive search
with a higher value of maxHits
allows a region to be explored in more
detail. Setting a high value of tbrIter
will search a local
neighbourhood more completely
Search for nearby peaks:
Ratchet iterations allow escape from local optima.
Setting ratchIter
to a high value searches the wider neighbourhood more
extensively for other nearby peaks; ratchEW = TRUE
accelerates these
exploratory searches.
Extensive search of final optimum. As with step 2, it may be valuable to
fully explore the optimum that is found after ratchet searches to be sure
that the locally optimal score has been obtained. Setting a high value of
finalIter
performs a thorough search that can give confidence that further
searches would not find better (local) trees.
A search is unlikely to have found a global optimum if:
Tree score continues to improve on the final iteration. If a local optimum
has not yet been reached, it is unlikely that a global optimum has
been reached.
Try increasing maxHits
.
Successive ratchet iterations continue to improve tree scores.
If a recent ratchet iteration improved the score, rather than finding
a different region of tree space with the same optimal score, it is likely
that still better global optima remain to be found. Try increasing
ratchIter
(more iterations give more chance for improvement) and
maxHits
(to get closer to the local optimum after each ratchet iteration).
Optimal areas of tree space are only visited by a single ratchet iteration.
(See vignette: Exploring tree space.)
If some areas of tree space are only found by one ratchet iteration, there
may well be other, better areas that have not yet been visited.
Try increasing ratchIter
.
When continuing a tree search, it is usually best to start from an optimal tree found during the previous iteration – there is no need to start from scratch.
A more time consuming way of checking that a global optimum has been reached is to repeat a search with the same parameters multiple times, starting from a different, entirely random tree each time. If all searches obtain the same optimal tree score despite their different starting points, this score is likely to correspond to the global optimum.
For detailed documentation of the "TreeSearch" package, including full instructions for loading phylogenetic data into R and initiating and configuring tree search, see the package documentation.
MaximizeParsimony()
returns a list of trees with class
multiPhylo
. This lists all trees found during each search step that
are within tolerance
of the optimal score, listed in the sequence that
they were first visited, and named according to the step in which they were
first found; it may contain more than maxHits
elements.
Note that the default search parameters may need to be increased in order for
these trees to be the globally optimal trees; examine the messages printed
during tree search to evaluate whether the optimal score has stabilized.
The return value has the attribute firstHit
, a named integer vector listing
the number of optimal trees visited for the first time in each stage of
the tree search. Stages are named:
seed
: starting trees;
start
: Initial TBR search;
ratchN
: Ratchet iteration N
;
final
: Final TBR search.
The first tree hit for the first time in ratchet iteration three is named
ratch3_1
.
Resample()
returns a multiPhylo
object containing a list of
trees obtained by tree search using a resampled version of dataset
.
Note that bootstrap support is a measure of the amount of data supporting a split, rather than the amount of confidence that should be afforded the grouping. "Bootstrap support of 100% is not enough, the tree must also be correct" \insertCitePhillips2004TreeSearch. See discussion in \insertCiteEgan2006;textualTreeSearch; \insertCiteWagele2009;textualTreeSearch; \insertCiteSimmons2011TreeSearch; \insertCiteKumar2012;textualTreeSearch.
For a discussion of suitable search parameters in resampling estimates, see \insertCiteMuller2005;textualTreeSearch. The user should decide whether to start each resampling from the optimal tree (which may be quicker, but result in overestimated support values as searches get stuck in local optima close to the optimal tree) or a random tree (which may take longer as more rearrangements are necessary to find an optimal tree on each iteration).
For other ways to estimate clade concordance, see SiteConcordance()
.
Martin R. Smith (martin.smith@durham.ac.uk)
Tree search via graphical user interface: EasyTrees()
Other split support functions:
JackLabels()
,
Jackknife()
,
SiteConcordance
## Only run examples in interactive R sessions
if (interactive()) {
# launch "shiny" point-and-click interface
EasyTrees()
# Here too, use the "continue search" function to ensure that tree score
# has stabilized and a global optimum has been found
}
# Load data for analysis in R
library("TreeTools")
data("congreveLamsdellMatrices", package = "TreeSearch")
dataset <- congreveLamsdellMatrices[[42]]
# A very quick run for demonstration purposes
trees <- MaximizeParsimony(dataset, ratchIter = 0, startIter = 0,
tbrIter = 1, maxHits = 4, maxTime = 1/100,
concavity = 10, verbosity = 4)
names(trees)
# In actual use, be sure to check that the score has converged on a global
# optimum, conducting additional iterations and runs as necessary.
if (interactive()) {
# Jackknife resampling
nReplicates <- 10
jackTrees <- replicate(nReplicates,
#c() ensures that each replicate returns a list of trees
c(Resample(dataset, trees, ratchIter = 0, tbrIter = 2, startIter = 1,
maxHits = 5, maxTime = 1 / 10,
concavity = 10, verbosity = 0))
)
# In a serious analysis, more replicates would be conducted, and each
# search would undergo more iterations.
# Now we must decide what to do with the multiple optimal trees from
# each replicate.
# Treat each tree equally
JackLabels(ape::consensus(trees), unlist(jackTrees, recursive = FALSE))
# Take the strict consensus of all trees for each replicate
JackLabels(ape::consensus(trees), lapply(jackTrees, ape::consensus))
# Take a single tree from each replicate (the first; order's irrelevant)
JackLabels(ape::consensus(trees), lapply(jackTrees, `[[`, 1))
}
# Tree search with a constraint
constraint <- MatrixToPhyDat(c(a = 1, b = 1, c = 0, d = 0, e = 0, f = 0))
characters <- MatrixToPhyDat(matrix(
c(0, 1, 1, 1, 0, 0,
1, 1, 1, 0, 0, 0), ncol = 2,
dimnames = list(letters[1:6], NULL)))
MaximizeParsimony(characters, constraint = constraint, verbosity = 0)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.