RogueTaxa | R Documentation |
RogueTaxa()
finds wildcard leaves whose removal increases the resolution
or branch support values of a consensus tree, using the relative
bipartition, shared phylogenetic, or mutual clustering concepts of
information.
RogueTaxa(
trees,
info = c("spic", "scic", "fspic", "fscic", "rbic"),
return = c("taxa", "tree"),
bestTree = NULL,
computeSupport = TRUE,
dropsetSize = 1,
neverDrop = character(0),
labelPenalty = 0,
mreOptimization = FALSE,
threshold = 50,
verbose = FALSE
)
QuickRogue(
trees,
info = "phylogenetic",
p = 0.5,
log = TRUE,
average = "median",
deviation = "mad",
neverDrop,
fullSeq = FALSE,
parallel = FALSE
)
C_RogueNaRok(
bootTrees = "",
runId = "tmp",
treeFile = "",
computeSupport = TRUE,
dropsetSize = 1,
excludeFile = "",
workDir = "",
labelPenalty = 0,
mreOptimization = FALSE,
threshold = 50
)
trees |
List of trees to analyse. |
info |
Concept of information to employ; see details. |
return |
If |
computeSupport |
Logical: If |
dropsetSize |
Integer specifying maximum size of dropset per iteration.
If |
neverDrop |
Tip labels that should not be dropped from the consensus. |
labelPenalty |
A weight factor to penalize for dropset size when
|
threshold , mreOptimization |
A threshold or mode for the consensus tree
that is optimized. Specify a value between 50 (majority rule consensus,
the default) and 100 (strict consensus), or set |
verbose |
Logical specifying whether to display output from RogueNaRok.
If |
p |
Proportion of trees that must contain a split before it is included in the consensus under consideration. 0.5, the default, corresponds to a majority rule tree; 1.0 will maximize the information content of the strict consensus. |
log |
Logical specifying whether to log-transform distances when calculating leaf stability. |
average |
Character specifying whether to use |
deviation |
Character specifying whether to use |
fullSeq |
Logical specifying whether to list all taxa ( |
parallel |
Logical specifying whether parallel execution should take place in C++. |
bootTrees |
Path to a file containing a collection of bootstrap trees. |
runId |
An identifier for this run, appended to output files. |
treeFile , bestTree |
If a single best-known tree (such as an ML or MP tree)
is provided, RogueNaRok optimizes the bootstrap support in this
best-known tree (still drawn from the bootstrap trees);
the |
excludeFile |
Taxa in this file (one taxon per line) will not be considered for pruning. |
workDir |
Path to a working directory where output files are created. |
"Rogue" or (loosely) "wildcard" taxa \insertCiteNixon1992Rogue are leaves whose position in a tree is poorly constrained, typically because much of the phylogenetic data associated with the taxon is either missing or in conflict with other data \insertCiteKearney2002Rogue.
These functions use heuristic methods to identify rogue taxa whose removal improves the information content of a consensus tree, by the definitions of information discussed below.
RogueTaxa()
returns a data.frame
. Each row after the first,
which describes the starting tree, describes a dropset operation.
Columns describe:
num
: Sequential index of the drop operation
taxNum
: Numeric identifier of the dropped leaves
taxon
: Text identifier of dropped leaves
rawImprovement
: Improvement in score obtained by this operation
IC
: Information content of tree after dropping all leaves so far,
by the measure indicated by info
.
C_RogueNaRok()
returns 0
if successful; -1
on error.
QuickRogue()
: Shortcut to "fast" heuristic, with option to return
evaluation of all taxa using fullSeq = TRUE
.
The splitwise phylogenetic information content measure produces the best results \insertCiteSmithConsRogue. It uses the splitwise information content as a shortcut, which involves double counting of some information (which may or may not be desirable). The same holds for the mutual clustering information measure; this measure is less obviously suited to the detection of rogues. This measure interprets split frequency as a proxy for the probability that a split is true, which is a valid interpretation of a Bayesian posterior sample \insertCiteHolder2008Rogue, a reasonable but imperfect interpretation of a bootstrap sample \insertCiteBerry1996Rogue, and a bad interpretation of a sample of most parsimonious trees.
The "relative bipartition information criterion" (RBIC) is the sum of all support values divided by the maximum possible support in a fully bifurcating tree with the initial set of taxa. The relative bipartition information content approach employs the 'RogueNaRok' implementation \insertCiteAberer2013Rogue, which can handle large trees relatively quickly. The RBIC is is not strictly a measure of information and can produce undesirable results \insertCiteWilkinson2017Rogue.
C_RogueNaRok()
directly interfaces the 'RogueNaRok' C implementation,
with no input checking; be aware that invalid input will cause undefined
behaviour and is likely to crash R.
Martin R. Smith (martin.smith@durham.ac.uk), linking to RogueNaRok C library by Andre Aberer (<andre.aberer at googlemail.com>)
library("TreeTools", warn.conflicts = FALSE)
trees <- list(read.tree(text = ("(a, (b, (c, (d, (e, (X1, X2))))));")),
read.tree(text = ("((a, (X1, X2)), (b, (c, (d, e))));")))
RogueTaxa(trees, dropsetSize = 2)
trees <- list(
read.tree(text = "((a, y), (b, (c, (z, ((d, e), (f, (g, x)))))));"),
read.tree(text = "(a, (b, (c, (z, (((d, y), e), (f, (g, x)))))));"),
read.tree(text = "(a, (b, ((c, z), ((d, (e, y)), ((f, x), g)))));"),
read.tree(text = "(a, (b, ((c, z), ((d, (e, x)), (f, (g, y))))));"),
read.tree(text = "(a, ((b, x), ((c, z), ((d, e), (f, (g, y))))));")
)
cons <- consensus(trees, p = 0.5)
plot(cons)
LabelSplits(cons, SplitFrequency(cons, trees) / length(trees))
reduced <- RogueTaxa(trees, info = "phylogenetic", ret = "tree")
plot(reduced)
LabelSplits(reduced, SplitFrequency(reduced, trees) / length(trees))
QuickRogue(trees, fullSeq = TRUE)
bootTrees <- system.file("example/150.bs", package = "Rogue")
tmpDir <- tempdir()
XX <- capture.output( # Don't print verbose run details to console
C_RogueNaRok(bootTrees, workDir = tmpDir)
)
# Results have been written to our temporary directory
oldwd <- setwd(tmpDir)
head(read.table("RogueNaRok_droppedRogues.tmp", header = TRUE))
# Delete temporary files
file.remove("RogueNaRok_droppedRogues.tmp")
file.remove("RogueNaRok_info.tmp")
setwd(oldwd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.