In USCbiostats/phylogenetic: Statistical Inference and Prediction of Annotations in Phylogenetic Trees

\tableofcontents

Phylogenetics R package

Implements phylogenetic peeling algorithm.
The model has parameters:
- $\pi$ Root node probabilities.
- $\psi$ Misclassification probabilities.
- $\mu$ Gain/Loss probabilities.
The log-likelihood function is written completely in C++ (RcppArmadillo)
Currently 3 different methods for estimating the model: Newton-Raphson and Artificial Bee Colony Optimization (ABC) in the side of MLE, and MCMC. (all three allow using priors)
Besides, has a bunch of functions for processing and visualizing data.
Let's see an example!

Phylogenetic Example

\footnotesize

library(aphylo)

# Loading and taking a look at the data
data("experiment")
head(experiment)
data("tree")
head(tree)

\normalsize

Phylogenetic Example (cont. 1)

\footnotesize

Make the data suitable for the peeling algorithm.
Also, we are borrowing methods from the ape package (like the plot).

# Preparing the data 
O <- get_offspring(experiment, "LeafId", tree, "NodeId", "ParentId")
plot(O)

\normalsize

Phylogenetic Example (cont. 2)

We also have our own methods

\footnotesize

# Nice personalized plot
plot_LogLike(O, nlevels = 60, plotfun = persp, theta = -pi*20, 
  shade=.7, border="darkblue", phi=30, scale=TRUE, 
  par.args = list(mar=c(1, 1, 1, 1), oma=c(0,0,4,0)),
  ticktype = "detailed")

# Adding title
mtext(
  "LogLikelihood",
  side=3, outer=FALSE, line = 2, cex=1.25)

\normalsize

Phylogenetic Example (cont. 3)

Here is an example using the phylo_mle

\footnotesize

# Computing Estimating the parameters 
ans_nr  <- aphylo_mle(rep(.5,5), dat=O)
ans_abc <- aphylo_mle(rep(.5,5), dat=O, useABC = TRUE,
                     control = list(maxCycle = 50))

oldpar <- par(no.readonly = TRUE)
par(mfrow=c(1,2))
plot(ans_nr, main = "MLE using Newton-Raphson")
plot(ans_abc, main = "MLE using ABCoptim")
par(oldpar)

\normalsize

Phylogenetic Example (cont. 4)

Using priors is very straight forward, e.g. $\psi\sim Beta(2,10)$

\footnotesize

ans_w_prior <- aphylo_mle(rep(.5,5), dat=O, useABC = TRUE,
                     priors = function(pars) dbeta(pars[1:2], 2, 10),
                     control = list(maxCycle = 50))

plot(ans_w_prior, main = "MLE using Newton-Raphson\n With Beta(2,10) priors")

\normalsize

R Package to-do list

Change the name
Stare at the Newton-Raphson implementation (still returns funny numbers...)
Implement (or grab) a phylogenetic tree simulation routine (ape and igraph have some candidates)
Run more test

MCMC function

MCMC

The MCMC function implements the Metropolis-Hastings algorithm
It uses a symmetric transition kernel: A normal random-walk with scale parameter $s$:

$$ \theta' = \theta + s\times Z,\quad Z\sim N(0,1) $$
Furthermore, using reflecting boundaries, it allows specifying upper and lower bounds $\theta' \in \left[\underline \theta, \overline\theta\right]$
Just like the mcmc::metrop function, the user needs to pass a function that computes a the log unnormalized probability density.
Preliminary tests show that MCMC is 3x faster than mcmc::metrop (!).
Lets see an example!

MCMC example

Here we are simulating 1,000 data points $x_i\sim N(2.6, 3)$
We are also defining the log unnormalized density function, fun. Notice that the only parameter required in fun is the vector of parameters itself (data D is passed by R's scoping rules).

\footnotesize

# Loading the package
library(aphylo)

# Simulating data
set.seed(1231)
n    <- 1e3
pars <- c(mean = 2.6, sd = 3)

# Generating data and writing the log likelihood function
D   <- rnorm(n, pars[1], pars[2])
fun <- function(pars) sum(log(dnorm(D, pars[1], pars[2])))

\normalsize

MCMC example (cont. 1)

This call of the function has the (current) complete list of arguments that can be used in it.

\footnotesize

system.time(
ans <<- MCMC(
  fun     = fun,              # The log unormalized dens
  initial = c(mu=1, sigma=1), # Initial state (names are optional)
  nbatch  = 1e5,              # Number of steps
  burnin  = 1e3,              # Burn-in
  thin    = 20,               # Thinning
  scale   = .1,               # Scale of transition kernel
  ub      = 10,               # Upper bound (same for both params)
  lb      = c(-10, .5),       # Lower bound
  useCpp  = TRUE              # Request using the C++ version
  ))

\normalsize

MCMC example (cont. 2)

ans, is an object of class mcmc from the coda package.

library(coda)
plot(ans)

MCMC example (cont. 3)

autocorr.plot(ans)

MCMC to-do list

Tempering MCMC.
Adaptative MCMC (global and local).
Parallel MCMC (multiple chains)... although this is easy to do with the parallel package
(another) Just Another Gibbs Sampler? Maybe RcppJAGS? (take a look at rcppbugs here and here)
...

\footnotesize

// Reflexion adjustment
for (int k=0; k<K; k++) {

  while( (ans[k] > ub[k]) | (ans[k] < lb[k]) ) {

    if (ans[k] > ub[k]) {
      ans[k] = 2.0*ub[k] - ans[k];
    } else {
      ans[k] = 2.0*lb[k] - ans[k];
    }  

  }

}

\normalsize

USCbiostats/phylogenetic documentation built on Oct. 28, 2023, 7:23 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

USCbiostats/phylogenetic
Statistical Inference and Prediction of Annotations in Phylogenetic Trees

In USCbiostats/phylogenetic: Statistical Inference and Prediction of Annotations in Phylogenetic Trees

Table of Contents

Phylogenetics R package

Phylogenetics R package

Phylogenetic Example

Phylogenetic Example (cont. 1)

Phylogenetic Example (cont. 2)

Phylogenetic Example (cont. 3)

Phylogenetic Example (cont. 4)

R Package to-do list

MCMC function

MCMC

MCMC example

MCMC example (cont. 1)

MCMC example (cont. 2)

MCMC example (cont. 3)

MCMC to-do list

R Package Documentation

Browse R Packages

We want your feedback!

USCbiostats/phylogenetic Statistical Inference and Prediction of Annotations in Phylogenetic Trees

In USCbiostats/phylogenetic: Statistical Inference and Prediction of Annotations in Phylogenetic Trees

Table of Contents

Phylogenetics R package

Phylogenetics R package

Phylogenetic Example

Phylogenetic Example (cont. 1)

Phylogenetic Example (cont. 2)

Phylogenetic Example (cont. 3)

Phylogenetic Example (cont. 4)

R Package to-do list

MCMC function

MCMC

MCMC example

MCMC example (cont. 1)

MCMC example (cont. 2)

MCMC example (cont. 3)

MCMC to-do list

R Package Documentation

Browse R Packages

We want your feedback!

USCbiostats/phylogenetic
Statistical Inference and Prediction of Annotations in Phylogenetic Trees