Combine a phylogenetic tree with data

Description

phylo4d is a generic constructor which merges a phylogenetic tree with data frames to create a combined object of class phylo4d

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
phylo4d(x, ...)

## S4 method for signature 'phylo4'
phylo4d(x, tip.data = NULL, node.data = NULL,
  all.data = NULL, merge.data = TRUE, metadata = list(), ...)

## S4 method for signature 'matrix'
phylo4d(x, tip.data = NULL, node.data = NULL,
  all.data = NULL, merge.data = TRUE, metadata = list(),
  edge.length = NULL, tip.label = NULL, node.label = NULL,
  edge.label = NULL, order = "unknown", annote = list(), ...)

## S4 method for signature 'phylo'
phylo4d(x, tip.data = NULL, node.data = NULL,
  all.data = NULL, check.node.labels = c("keep", "drop", "asdata"),
  annote = list(), metadata = list(), ...)

## S4 method for signature 'phylo4d'
phylo4d(x, ...)

## S4 method for signature 'nexml'
phylo4d(x)

Arguments

x

an object of class phylo4, phylo, nexml or a matrix of edges (see above)

tip.data

a data frame (or object to be coerced to one) containing only tip data (Optional)

node.data

a data frame (or object to be coerced to one) containing only node data (Optional)

all.data

a data frame (or object to be coerced to one) containing both tip and node data (Optional)

merge.data

if both tip.data and node.data are provided, should columns with common names will be merged together (default TRUE) or not (FALSE)? See details.

metadata

any additional metadata to be passed to the new object

edge.length

Edge (branch) length. (Optional)

tip.label

A character vector of species names (names of "tip" nodes). (Optional)

node.label

A character vector of internal node names. (Optional)

edge.label

A character vector of edge (branch) names. (Optional)

order

character: tree ordering (allowable values are listed in phylo4_orderings, currently "unknown", "preorder" (="cladewise" in ape), and "postorder", with "cladewise" and "pruningwise" also allowed for compatibility with ape)

annote

any additional annotation data to be passed to the new object

check.node.labels

if x is of class phylo, use either “keep” (the default) to retain internal node labels, “drop” to drop them, or “asdata” to convert them to numeric tree data. This argument is useful if the phylo object has non-unique node labels or node labels with informative data (e.g., posterior probabilities).

...

further arguments to control the behavior of the constructor in the case of missing/extra data and where to look for labels in the case of non-unique labels that cannot be stored as row names in a data frame (see Details).

Details

You can provide several data frames to define traits associated with tip and/or internal nodes. By default, data row names are used to link data to nodes in the tree, with any number-like names (e.g., “10”) matched against node ID numbers, and any non-number-like names (e.g., “n10”) matched against node labels. Alternative matching rules can be specified by passing additional arguments (listed in the Details section); these include positional matching, matching exclusively on node labels, and matching based on a column of data rather than on row names.

Matching rules will apply the same way to all supplied data frames. This means that you need to be consistent with the row names of your data frames. It is good practice to use tip and node labels (or node numbers if you use duplicated labels) when you combine data with a tree.

If you provide both tip.data and node.data, the treatment of columns with common names will depend on the merge.data argument. If TRUE, columns with the same name in both data frames will be merged; when merging columns of different data types, coercion to a common type will follow standard R rules. If merge.data is FALSE, columns with common names will be preserved independently, with “.tip” and “.node” appended to the names. This argument has no effect if tip.data and node.data have no column names in common.

If you provide all.data along with either of tip.data and node.data, it must have distinct column names, otherwise an error will result. Additionally, although supplying columns with the same names within data frames is not illegal, automatic renaming for uniqeness may lead to surprising results, so this practice should be avoided.

This is the list of additional arguments that can be used to control matching between the tree and the data:

Rules for matching rows of data to tree nodes are determined jointly by the match.data and rownamesAsLabels arguments. If match.data is TRUE, data frame rows will be matched exclusively against tip and node labels if rownamesAsLabels is also TRUE, whereas any all-digit row names will be matched against tip and node numbers if rownamesAsLabels is FALSE (the default). If match.data is FALSE, rownamesAsLabels has no effect, and row matching is purely positional with respect to the order returned by nodeId(phy, type).

Value

An object of class phylo4d.

Methods

x = "phylo4"

merges a tree of class phylo4 with a data.frame into a phylo4d object

x = "matrix"

merges a matrix of tree edges similar to the edge slot of a phylo4 object (or to \$edge of a phylo object) with a data.frame into a phylo4d object

x = "phylo"

merges a tree of class phylo with a data.frame into a phylo4d object

Note

Checking on matches between the tree and the data will be done by the validity checker (label matches between data and tree tips, number of rows of data vs. number of nodes/tips/etc.)

Author(s)

Ben Bolker, Thibaut Jombart, Steve Kembel, Francois Michonneau, Jim Regetz

See Also

coerce-methods for translation functions. The phylo4d class; phylo4 class and phylo4 constructor.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
treeOwls <- "((Strix_aluco:4.2,Asio_otus:4.2):3.1,Athene_noctua:7.3);"
tree.owls.bis <- ape::read.tree(text=treeOwls)
try(phylo4d(as(tree.owls.bis,"phylo4"),data.frame(wing=1:3)), silent=TRUE)
obj <- phylo4d(as(tree.owls.bis,"phylo4"),data.frame(wing=1:3), match.data=FALSE)
obj
print(obj)

####

data(geospiza_raw)
geoTree <- geospiza_raw$tree
geoData <- geospiza_raw$data

## fix differences in tip names between the tree and the data
geoData <- rbind(geoData, array(, dim = c(1,ncol(geoData)),
                  dimnames = list("olivacea", colnames(geoData))))

### Example using a tree of class 'phylo'
exGeo1 <- phylo4d(geoTree, tip.data = geoData)

### Example using a tree of class 'phylo4'
geoTree <- as(geoTree, "phylo4")

## some random node data
rNodeData <- data.frame(randomTrait = rnorm(nNodes(geoTree)),
                        row.names = nodeId(geoTree, "internal"))

exGeo2 <- phylo4d(geoTree, tip.data = geoData, node.data = rNodeData)

### Example using 'merge.data'
data(geospiza)
trGeo <- extractTree(geospiza)
tDt <- data.frame(a=rnorm(nTips(trGeo)), row.names=nodeId(trGeo, "tip"))
nDt <- data.frame(a=rnorm(nNodes(trGeo)), row.names=nodeId(trGeo, "internal"))

(matchData1 <- phylo4d(trGeo, tip.data=tDt, node.data=nDt, merge.data=FALSE))
(matchData2 <- phylo4d(trGeo, tip.data=tDt, node.data=nDt, merge.data=TRUE))

## Example with 'all.data'
nodeLabels(geoTree) <- as.character(nodeId(geoTree, "internal"))
rAllData <- data.frame(randomTrait = rnorm(nTips(geoTree) + nNodes(geoTree)),
row.names = labels(geoTree, 'all'))

exGeo5 <- phylo4d(geoTree, all.data = rAllData)

## Examples using 'rownamesAsLabels' and comparing with match.data=FALSE
tDt <- data.frame(x=letters[1:nTips(trGeo)],
                  row.names=sample(nodeId(trGeo, "tip")))
tipLabels(trGeo) <- as.character(sample(1:nTips(trGeo)))
(exGeo6 <- phylo4d(trGeo, tip.data=tDt, rownamesAsLabels=TRUE))
(exGeo7 <- phylo4d(trGeo, tip.data=tDt, rownamesAsLabels=FALSE))
(exGeo8 <- phylo4d(trGeo, tip.data=tDt, match.data=FALSE))

## generate a tree and some data
set.seed(1)
p3 <- ape::rcoal(5)
dat <- data.frame(a = rnorm(5), b = rnorm(5), row.names = p3$tip.label)
dat.defaultnames <- dat
row.names(dat.defaultnames) <- NULL
dat.superset <- rbind(dat, rnorm(2))
dat.subset <- dat[-1, ]

## create a phylo4 object from a phylo object
p4 <- as(p3, "phylo4")

## create phylo4d objects with tip data
p4d <- phylo4d(p4, dat)
###checkData(p4d)
p4d.sorted <- phylo4d(p4, dat[5:1, ])
try(p4d.nonames <- phylo4d(p4, dat.defaultnames))
p4d.nonames <- phylo4d(p4, dat.defaultnames, match.data=FALSE)

## Not run: 
p4d.subset <- phylo4d(p4, dat.subset)
p4d.subset <- phylo4d(p4, dat.subset)
try(p4d.superset <- phylo4d(p4, dat.superset))
p4d.superset <- phylo4d(p4, dat.superset)

## End(Not run)

## create phylo4d objects with node data
nod.dat <- data.frame(a = rnorm(4), b = rnorm(4))
p4d.nod <- phylo4d(p4, node.data = nod.dat, match.data=FALSE)


## create phylo4 objects with node and tip data
p4d.all1 <- phylo4d(p4, node.data = nod.dat, tip.data = dat, match.data=FALSE)
nodeLabels(p4) <- as.character(nodeId(p4, "internal"))
p4d.all2 <- phylo4d(p4, all.data = rbind(dat, nod.dat), match.data=FALSE)

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.