Expression divergence of two genomes can be considered as concerted evolution of transcriptome from their common ancestor, which can be measured by among orthologous genes' variance components. And more specifically, the expression levels of same tissue across different species can be treated as taxonomic units, which can be placed at the tips of character tree that represents expression evolution of these species.

In here, we will give an example to build a character tree from expression data (expression phylogeny).

TreeExp can be loaded the package in the usual way:

library('TreeExp')

We load the datasets created from six tissues' expression data of nine tetrapod species

data(tetraexp)

Distance matrix:

First, we generate an expression distance matrix of these nine tetrapod species:

dismat <- expdist(tetraexp.objects, taxa = "all",
                 subtaxa = "Brain",
                 method = "pea")
as.dist(dismat)

You can specify "taxa" and "subtaxa" options in the expdist function as well. The default model "pea" is to calculate pair-wise distances by Pearson distance, which equals 1-Pearson’s coefficient of expression level.

Also, if you already have a data frame with normalized expression values, there are internal functions available for creating expression distance matrix.

For instance,

expression_table <- exptabTE(tetraexp.objects, taxa = "all",
                            subtaxa = "Brain")

dismat <- dist.pea(expression_table)
colnames(dismat) <- colnames(expression_table)
rownames(dismat) <- colnames(dismat)

if you have your own expression data frame in the format as the "expression_table" here, it will do fine:

dismat <- dist.pea(your_own_dataframe)
colnames(dismat) <- colnames(your_own_dataframe)
rownames(dismat) <- colnames(dismat)

Expression character tree:

After the expression distance matrix is created, you can construct character tree by Neighbor-Joining, and bootstrap values based on re-sampling orthologous genes with replacements can also be generated by boot.phylo function:

tr <- NJ(dismat)
tr <- root(tr, "Chicken_Brain", resolve.root = T)

exptable <- exptabTE(tetraexp.objects, taxa = "all",
                     subtaxa = "Brain")

f <- function(xx) {

     mat <- dist.pea(t(xx))
     # the distance metrics here should be the same as you specified 
     # when you created the expression distance matrix 

    colnames(mat) <- rownames(xx)
    rownames(mat) <- colnames(mat)

    root(NJ(mat), "Chicken_Brain", resolve.root = T)

}

bs <-  boot.phylo(tr, t(exptable), f, B = 100) 

# boot.phylo are sampling in columns of a matrix and we want to sample in rows

tr$node.label = bs
plot(tr, show.node.label = TRUE)

By now, an expression character tree is successfully constructed. The tree shows expression patterns' similarities in selected genes of designated species. The expression tree is largely in accordance with species tree with minor discrepancy.

Phenomenon of evolutionary history dominates the evolutionary expression pattern can be described as phylogenetic signals. One way to interpret highly consistent expression character tree is that expression levels of transcriptome, representing the regulatory changes, accumulated over time. Though not as concrete as sequence data, expression levels generated from transcriptome data across species show strong phylogenetic signals.



hr1912/phyExp documentation built on July 13, 2019, 5:18 p.m.