Minimum Spanning Tree

Description

Function spantree finds a minimum spanning tree connecting all points, but disregarding dissimilarities that are at or above the threshold or NA.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
spantree(d, toolong = 0)
## S3 method for class 'spantree'
as.hclust(x, ...)
## S3 method for class 'spantree'
cophenetic(x)
spandepth(x)
## S3 method for class 'spantree'
plot(x, ord, cex = 0.7, type = "p", labels, dlim,
     FUN = sammon,  ...)
## S3 method for class 'spantree'
lines(x, ord, display="sites", ...)

Arguments

d

Dissimilarity data inheriting from class dist or a an object, such as a matrix, that can be converted to a dissimilarity matrix. Functions vegdist and dist are some functions producing suitable dissimilarity data.

toolong

Shortest dissimilarity regarded as NA. The function uses a fuzz factor, so that dissimilarities close to the limit will be made NA, too. If toolong = 0 (or negative), no dissimilarity is regarded as too long.

x

A spantree result object.

ord

An ordination configuration, or an ordination result known by scores.

cex

Character expansion factor.

type

Observations are plotted as points with type="p" or type="b", or as text label with type="t". The tree (lines) will always be plotted.

labels

Text used with type="t" or node names if this is missing.

dlim

A ceiling value used to highest cophenetic dissimilarity.

FUN

Ordination function to find the configuration from cophenetic dissimilarities.

display

Type of scores used for ord.

...

Other parameters passed to functions.

Details

Function spantree finds a minimum spanning tree for dissimilarities (there may be several minimum spanning trees, but the function finds only one). Dissimilarities at or above the threshold toolong and NAs are disregarded, and the spanning tree is found through other dissimilarities. If the data are disconnected, the function will return a disconnected tree (or a forest), and the corresponding link is NA. Connected subtrees can be identified using distconnected.

Minimum spanning tree is closesly related to single linkage clustering, a.k.a. nearest neighbour clustering, and in genetics as neighbour joining tree available in hclust and agnes functions. The most important practical difference is that minimum spanning tree has no concept of cluster membership, but always joins individual points to each other. Function as.hclust can change the spantree result into a corresponding hclust object.

Function cophenetic finds distances between all points along the tree segments. Function spandepth returns the depth of each node. The nodes of a tree are either leaves (with one link) or internal nodes (more than one link). The leaves are recursively removed from the tree, and the depth is the layer at with the leaf was removed. In disconnected spantree object (in a forest) each tree is analysed separately and disconnected nodes not in any tree have depth zero.

Function plot displays the tree over a supplied ordination configuration, and lines adds a spanning tree to an ordination graph. If configuration is not supplied for plot, the function ordinates the cophenetic dissimilarities of the spanning tree and overlays the tree on this result. The default ordination function is sammon (package MASS), because Sammon scaling emphasizes structure in the neighbourhood of nodes and may be able to beautifully represent the tree (you may need to set dlim, and sometimes the results will remain twisted). These ordination methods do not work with disconnected trees, but you must supply the ordination configuration. Function lines will overlay the tree in an existing plot.

Function spantree uses Prim's method implemented as priority-first search for dense graphs (Sedgewick 1990). Function cophenetic uses function stepacross with option path = "extended". The spantree is very fast, but cophenetic is slow in very large data sets.

Value

Function spantree returns an object of class spantree which is a list with two vectors, each of length n-1. The number of links in a tree is one less the number of observations, and the first item is omitted. The items are

kid

The child node of the parent, starting from parent number two. If there is no link from the parent, value will be NA and tree is disconnected at the node.

dist

Corresponding distance. If kid = NA, then dist = 0.

labels

Names of nodes as found from the input dissimilarities.

call

The function call.

Note

In principle, minimum spanning tree is equivalent to single linkage clustering that can be performed using hclust or agnes. However, these functions combine clusters to each other and the information of the actually connected points (the “single link”) cannot be recovered from the result. The graphical output of a single linkage clustering plotted with ordicluster will look very different from an equivalent spanning tree plotted with lines.spantree.

Author(s)

Jari Oksanen

References

Sedgewick, R. (1990). Algorithms in C. Addison Wesley.

See Also

vegdist or dist for getting dissimilarities, and hclust or agnes for single linkage clustering.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(dune)
dis <- vegdist(dune)
tr <- spantree(dis)
## Add tree to a metric scaling 
plot(tr, cmdscale(dis), type = "t")
## Find a configuration to display the tree neatly
plot(tr, type = "t")
## Depths of nodes
depths <- spandepth(tr)
plot(tr, type = "t", label = depths)
## Plot as a dendrogram
plot(as.hclust(tr))

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.