Constructing pan-genome trees

Share:

Description

Creates a pan-genome tree based on a pan-matrix and a distance function.

Usage

1
2
panTree(pan.matrix, dist.FUN = distManhattan, nboot = 0,
  linkage = "average", ...)

Arguments

pan.matrix

A Panmat object, see panMatrix.

dist.FUN

A valid distance function, see below.

nboot

Number of bootstrap samples.

linkage

The linkage function, see below.

...

Additional parameters passed on to the specified distance function, see Details below.

Details

A pan-genome tree is a graphical display of the genomes in a pan-genome study, based on some pan-matrix (Snipen & Ussery, 2010). panTree is a constructor that computes a Pantree object, use plot.Pantree to actually plot the tree.

The parameter dist.FUN must be a function that takes as input a numerical matrix (Panmat object) and returns a dist object. See distManhattan or distJaccard for examples of such functions. Any additional arguments (...) are passed on to this function.

If you want to have bootstrap-values in the tree, set nboot to some appropriate number (e.g. nboot=100).

The tree is created by hclust (hierarchical clustering) using the average linkage function, which is according to Snipen & Ussery, 2010. You may specify alternatives by the parameter linkage, see hclust for details.

Value

This function returns a Pantree object, which is a small (S3) extension to a list with 4 components. These components are named Htree, Nboot, Nbranch and Dist.FUN.

Htree is a hclust object. This is the actual tree. Nboot is the number of bootstrap samples. Nbranch is a vector listing the number of times each split/clade in the tree was observed in the bootstrap procedure. Dist.FUN is the name of the distance function used to construct the tree.

Author(s)

Lars Snipen and Kristian Hovde Liland.

References

Snipen, L., Ussery, D.W. (2010). Standard operating procedure for computing pangenome trees. Standards in Genomic Sciences, 2:135-141.

See Also

panMatrix, distManhattan, distJaccard, plot.Pantree.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Loading a Panmat object, constructing a tree and plotting it 
data(list="Mpneumoniae.blast.panmat",package="micropan")
my.tree <- panTree(Mpneumoniae.blast.panmat)
plot(my.tree)

# Computing some weights to be used in the distManhattan
# function below...
w <- geneWeights(Mpneumoniae.blast.panmat,type="shell")
# Creating another tree with scaled and weighted distances and bootstrap values
my.tree <- panTree(Mpneumoniae.blast.panmat, scale=0.1, weights=w)

# ...and plotting with alternative labels and colors from Mpneumoniae.table
data(list="Mpneumoniae.table",package="micropan")
labels <- Mpneumoniae.table$Strain
names(labels) <- Mpneumoniae.table$GID.tag
cols <- Mpneumoniae.table$Color
names(cols) <- Mpneumoniae.table$GID.tag
plot(my.tree, leaf.lab=labels, col=cols,cex=0.8, xlab="Shell-weighted Manhattan distances")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.