dcAncestralMP: Function to reconstruct ancestral discrete states using...
In hfang-bristol/dcGOR: Analysis of Ontologies and Protein Domain Annotations

dcAncestralMP

R Documentation

Function to reconstruct ancestral discrete states using maximum parsimony algorithm

Description

dcAncestralMP is supposed to reconstruct ancestral discrete states using a maximum parsimony-modified Fitch algorithm. In a from-tip-to-root manner, ancestral state for an internal node is determined if a state is shared in a majority by all its children. If two or more states in a majority are equally shared, this internal node is temporarily marked as an unknown tie, which is further resolved in a from-root-to-tip manner: always being the same state as its direct parent holds. If the ties also occur at the root, the state at the root is set to the last state in ties (for example, usually being 'present' for 'present'-'absent' two states).

Usage

dcAncestralMP(
data,
phy,
output.detail = F,
parallel = T,
multicores = NULL,
verbose = T
)

Arguments

`data`	an input data matrix/frame storing discrete states for tips (in rows) X characters (in columns). The rows in the matrix are for tips. If the row names do not exist, then addumedly they have the same order as in the tree tips. More wisely, users provide row names which can be matched to the tip labels of the tree. The row names can be more than found in the tree labels, and they should contain all those in the tree labels
`phy`	an object of class 'phylo'
`output.detail`	logical to indicate whether the output is returned as a detailed list. If TRUE, a nested list is returned: a list of characters (corresponding to columns of input data matrix), in which each element is a list consisting of three components ("states", "transition" and "relative"). If FALSE, a matrix is returned: the columns respond to the input data columns, and rows responding to all node index in the phylo-formatted tree
`parallel`	logical to indicate whether parallel computation with multicores is used. By default, it sets to true, but not necessarily does so. Partly because parallel backends available will be system-specific (now only Linux or Mac OS). Also, it will depend on whether these two packages "foreach" and "doMC" have been installed. It can be installed via: `source("http://bioconductor.org/biocLite.R"); biocLite(c("foreach","doMC"))`. If not yet installed, this option will be disabled
`multicores`	an integer to specify how many cores will be registered as the multicore parallel backend to the 'foreach' package. If NULL, it will use a half of cores available in a user's computer. This option only works when parallel computation is enabled
`verbose`	logical to indicate whether the messages will be displayed in the screen. By default, it sets to TRUE for display

Value

It depends on the 'output.detail'. If FALSE (by default), a matrix is returned, with the columns responding to the input data columns, and rows responding to node index in the phylo-formatted tree. If TRUE, a nested list is returned. Outer-most list is for characters (corresponding to columns of input data matrix), in which each elemenl is a list (inner-most) consisting of three components ("states", "transition" and "relative"):

states: a named vector storing states (extant and ancestral states)
transition: a posterior transition matrix between states
relative: a matrix of nodes X states, storing relative probability

Note

This maximum parsimony algorithm for ancestral discrete state reconstruction is attributable to the basic idea as described in http://sysbio.oxfordjournals.org/content/20/4/406.short

Examples

# 1) a newick tree that is imported as a phylo-formatted tree
tree <- "(((t1:5,t2:5):2,(t3:4,t4:4):3):2,(t5:4,t6:4):6);"
phy <- ape::read.tree(text=tree)

# 2) an input data matrix storing discrete states for tips (in rows) X four characters (in columns)
data1 <- matrix(c(0,rep(1,3),rep(0,2)), ncol=1)
data2 <- matrix(c(rep(0,4),rep(1,2)), ncol=1)
data <- cbind(data1, data1, data1, data2)
colnames(data) <- c("C1", "C2", "C3", "C4")
## reconstruct ancestral states, without detailed output
res <- dcAncestralMP(data, phy, parallel=FALSE)
res

# 3) an input data matrix storing discrete states for tips (in rows) X only one character
data <- matrix(c(0,rep(1,3),rep(0,2)), ncol=1)
## reconstruct ancestral states, with detailed output
res <- dcAncestralMP(data, phy, parallel=FALSE, output.detail=TRUE)
res
## get the inner-most list
res <- res[[1]]
## visualise the tree with ancestral states and their conditional probability
Ntip <- ape::Ntip(phy)
Nnode <- ape::Nnode(phy)
color <- c("white","gray")
## visualise main tree
ape::plot.phylo(phy, type="p", use.edge.length=TRUE, label.offset=1,
show.tip.label=TRUE, show.node.label=FALSE)
## visualise tips (state 1 in gray, state 0 in white)
x <- data[,1]
ape::tiplabels(pch=22, bg=color[as.numeric(x)+1], cex=2, adj=1)
## visualise internal nodes
### thermo bar to illustrate relative probability (state 1 in gray, state 0 in white)
ape::nodelabels(thermo=res$relative[Ntip+1:Nnode,2:1],
piecol=color[2:1], cex=0.75)
### labeling reconstructed ancestral states
ape::nodelabels(text=res$states[Ntip+1:Nnode], node=Ntip+1:Nnode,
frame="none", col="red", bg="transparent", cex=0.75)

hfang-bristol/dcGOR documentation built on July 16, 2022, 6:43 p.m.