RScelestial Vignette

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(RScelestial)
# We load igraph for drawing trees. If you do not want to draw,
# there is no need to import igraph.
library(igraph)

Installing RScelestial

The RScelestial package could be installed easily as follows

install.packages("RScelestial")

Simulation

Here we show a simulation. We build a data set with following command.

# Following command generates ten samples with 20 loci. 
# Rate of mutations on each edge of the evolutionary tree is 1.5. 
D = synthesis(10, 20, 5, seed = 7)
D

Inferring the phylogenetic tree

seq = as.ten.state.matrix(D$seqeunce)
SP = scelestial(seq, return.graph = TRUE)
SP

You can draw the graph with following command

tree.plot(SP, vertex.size = 30)

Also, we can make a rooted tree with cell "C8" as the root of the tree as follows:

SP = scelestial(seq, root.assign.method = "fix", root = "C8", return.graph = TRUE)
tree.plot(SP, vertex.size = 30)

Setting root.assign.method to "balance" lets the algorithm decide for a root that produces minimum height tree.

SP = scelestial(seq, root.assign.method = "balance", return.graph = TRUE)
tree.plot(SP, vertex.size = 30)

Evaluating results

Following command calculates the distance array between pairs of samples.

D.distance.matrix <- distance.matrix.true.tree(D)
D.distance.matrix
SP.distance.matrix <- distance.matrix.scelestial(SP)
SP.distance.matrix
## Difference between normalized distance matrices
vertices <- rownames(SP.distance.matrix)
sum(abs(D.distance.matrix[vertices,vertices] - SP.distance.matrix))

Running Scelestial on multiple sequence alignment

Given a multiple sequence alignment, Scelestial infers the phylogeny of them. Here we present a simple example. First we load libraries to load a multiple alignment.

library(stringr)
library(seqinr)

In this example, we load a multiple alignment from seqinr package.

data(phylip, package = "seqinr")

Then we clean the data and build a zero-one matrix representing taxa and characters. Note that Scelestial accept matrices with taxa as its columns and characters as its rows.

# Removing non-informative columns and duplicate rows.
mcb <-  toupper(t(sapply(seq(phylip$seq), function(i) unlist(strsplit(phylip$seq[[i]], '')))))
ccb <- as.character(phylip$seq)
occb <- order(ccb)
cbColMask <- sapply(seq(ncol(mcb)), function(j) length(levels(as.factor(mcb[,j]))) == 1)
cbRowMask <- rep(TRUE, length(ccb))
for (i in seq(length(ccb))) {
    if (i == 1 || ccb[occb[i]] != ccb[occb[i-1]]) {
        cbRowMask[occb[i]] <- FALSE
    }
}
mcbRows <- apply(mcb[!cbRowMask, !cbColMask], MARGIN = 1, FUN = function(a) paste0(str_replace(a, "-", "X"), collapse = ""))

Executing Scelestial on the input matrix.

n.seq <- data.frame(nodes = phylip$nam[!cbRowMask], seq = mcbRows)
seq2 <- data.frame(t(as.ten.state.matrix.from.node.seq(n.seq)), stringsAsFactors = TRUE)
# Running Scelestial
SP = scelestial(seq2, return.graph = TRUE)
tree.plot(SP, vertex.size=20, vertex.label.dist=0, asp = 0, vertex.label.cex = 1)


Try the RScelestial package in your browser

Any scripts or data that you put into this service are public.

RScelestial documentation built on Dec. 11, 2021, 10:02 a.m.