boot.phylo: Tree Bipartition and Bootstrapping Phylogenies

View source: R/dist.topo.R

boot.phyloR Documentation

Tree Bipartition and Bootstrapping Phylogenies

Description

These functions analyse bipartitions found in a series of trees.

prop.part counts the number of bipartitions found in a series of trees given as .... If a single tree is passed, the returned object is a list of vectors with the tips descending from each node (i.e., clade compositions indexed by node number).

prop.clades counts the number of times the bipartitions present in phy are present in a series of trees given as ... or in the list previously computed and given with part.

boot.phylo performs a bootstrap analysis.

Usage

boot.phylo(phy, x, FUN, B = 100, block = 1,
           trees = FALSE, quiet = FALSE,
           rooted = is.rooted(phy), jumble = TRUE,
            mc.cores = 1)
prop.part(..., check.labels = TRUE)
prop.clades(phy, ..., part = NULL, rooted = FALSE)
## S3 method for class 'prop.part'
print(x, ...)
## S3 method for class 'prop.part'
summary(object, ...)
## S3 method for class 'prop.part'
plot(x, barcol = "blue", leftmar = 4, col = "red", ...)

Arguments

phy

an object of class "phylo".

x

in the case of boot.phylo: a taxa (rows) by characters (columns) matrix; in the case of print and plot: an object of class "prop.part".

FUN

the function used to estimate phy (see details).

B

the number of bootstrap replicates.

block

the number of columns in x that will be resampled together (see details).

trees

a logical specifying whether to return the bootstraped trees (FALSE by default).

quiet

a logical: a progress bar is displayed by default.

rooted

a logical specifying whether the trees should be treated as rooted or not.

jumble

a logical value. By default, the rows of x are randomized to avoid artificially too large bootstrap values associated with very short branches.

mc.cores

the number of cores (CPUs) to be used (passed to parallel).

...

either (i) a single object of class "phylo", (ii) a series of such objects separated by commas, or (iii) a list containing such objects. In the case of plot further arguments for the plot (see details).

check.labels

a logical specifying whether to check the labels of each tree. If FALSE, it is assumed that all trees have the same tip labels, and that they are in the same order (see details).

part

a list of partitions as returned by prop.part; if this is used then ... is ignored.

object

an object of class "prop.part".

barcol

the colour used for the bars displaying the number of partitions in the upper panel.

leftmar

the size of the margin on the left to display the tip labels.

col

the colour used to visualise the bipartitions.

Details

The argument FUN in boot.phylo must be the function used to estimate the tree from the original data matrix. Thus, if the tree was estimated with neighbor-joining (see nj), one maybe wants something like FUN = function(xx) nj(dist.dna(xx)).

block in boot.phylo specifies the number of columns to be resampled altogether. For instance, if one wants to resample at the codon-level, then block = 3 must be used.

Using check.labels = FALSE in prop.part decreases computing times. This requires that (i) all trees have the same tip labels, and (ii) these labels are ordered similarly in all trees (in other words, the element tip.label are identical in all trees).

The plot function represents a contingency table of the different partitions (on the x-axis) in the lower panel, and their observed numbers in the upper panel. Any further arguments (...) are used to change the aspects of the points in the lower panel: these may be pch, col, bg, cex, etc. This function works only if there is an attribute labels in the object.

The print method displays the partitions and their numbers. The summary method extracts the numbers only.

Value

prop.part returns an object of class "prop.part" which is a list with an attribute "number". The elements of this list are the observed clades, and the attribute their respective numbers. If the default check.labels = FALSE is used, an attribute "labels" is added, and the vectors of the returned object contains the indices of these labels instead of the labels themselves.

prop.clades and boot.phylo return a numeric vector which ith element is the number associated to the ith node of phy. If trees = TRUE, boot.phylo returns a list whose first element (named "BP") is like before, and the second element ("trees") is a list with the bootstraped trees.

summary returns a numeric vector.

Note

prop.clades calls internally prop.part with the option check.labels = TRUE, which may be very slow. If the trees passed as ... fulfills conditions (i) and (ii) above, then it might be faster to first call, e.g., pp <- prop.part(...), then use the option part: prop.clades(phy, part = pp).

Since ape 3.5, prop.clades should return sensible results for all values of rooted: if FALSE, the numbers of bipartitions (or splits); if TRUE, the number of clades (of hopefully rooted trees).

Author(s)

Emmanuel Paradis

References

Efron, B., Halloran, E. and Holmes, S. (1996) Bootstrap confidence levels for phylogenetic trees. Proceedings of the National Academy of Sciences USA, 93, 13429–13434.

Felsenstein, J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783–791.

See Also

as.bitsplits, dist.topo, consensus, nodelabels

Examples

data(woodmouse)
f <- function(x) nj(dist.dna(x))
tr <- f(woodmouse)
### Are bootstrap values stable?
for (i in 1:5)
  print(boot.phylo(tr, woodmouse, f, quiet = TRUE))
### How many partitions in 100 random trees of 10 labels?...
TR <- rmtree(100, 10)
pp10 <- prop.part(TR)
length(pp10)
### ... and in 100 random trees of 20 labels?
TR <- rmtree(100, 20)
pp20 <- prop.part(TR)
length(pp20)
plot(pp10, pch = "x", col = 2)
plot(pp20, pch = "x", col = 2)

set.seed(2)
tr <- rtree(10) # rooted
## the following used to return a wrong result with ape <= 3.4:
prop.clades(tr, tr)
prop.clades(tr, tr, rooted = TRUE)
tr <- rtree(10, rooted = FALSE)
prop.clades(tr, tr) # correct

### an illustration of the use of prop.clades with bootstrap trees:

fun <- function(x) as.phylo(hclust(dist.dna(x), "average")) # upgma() in phangorn
tree <- fun(woodmouse)
## get 100 bootstrap trees:
bstrees <- boot.phylo(tree, woodmouse, fun, trees = TRUE)$trees
## get proportions of each clade:
clad <- prop.clades(tree, bstrees, rooted = TRUE)
## get proportions of each bipartition:
boot <- prop.clades(tree, bstrees)
layout(1)
par(mar = rep(2, 4))
plot(tree, main = "Bipartition vs. Clade Support Values")
drawSupportOnEdges(boot)
nodelabels(clad)
legend("bottomleft", legend = c("Bipartitions", "Clades"), pch = 22,
       pt.bg = c("green", "lightblue"), pt.cex = 2.5)

## Not run: 
## an example of double bootstrap:
nrep1 <- 100
nrep2 <- 100
p <- ncol(woodmouse)
DB <- 0

for (b in 1:nrep1) {
    X <- woodmouse[, sample(p, p, TRUE)]
    DB <- DB + boot.phylo(tr, X, f, nrep2, quiet = TRUE)
}
DB
## to compare with:
boot.phylo(tr, woodmouse, f, 1e4)

## End(Not run)

ape documentation built on May 29, 2024, 10:50 a.m.