This vignette demonstrates the use of the package. An HIV phylogenetic tree is included in the package directory. To visualise the tree:

library(genieR)
library(ape)
data(village)
plot(village,show.tip.label = F,edge.color = sample(colors(), length(village$edge)/2,replace=T))
title("A phylogenetic tree with random colors")

The package can sort out sampling and coalescent times by

x=att(village)
head(x)
tail(x)

The first column of x is the times and the second column of x is the count of current samples, which increases by one representing a sampling events and decreases by one representing a coalescent event. To see the times of coalescents,

head(x$t[diff(x[,2])==-1])

and the times of sampling,

head(x$t[diff(x[,2])==1])

A simpler way of extract sampling and coalescents times is through function

y=heterochronous.gp.stat(village)
head(y$coal.times)
head(y$sample.times)
head(y$sampled.lineages)

Apart from collecting the basic information of a phylogenetic tree, we can fit a exponential growth coalescent model for this tree by Genefit,

fit1=geniefit(village,Model="expo",start=c(10,.1),upper=Inf,lower=0)
fit1

The output includes estimated parameters, corresponding logliklihood and the AIC information of this exponetial growth model. Other coalescent models to choose are const (constant population size),expan (expansion growth), log (logistic growth), step (piecewise constant), pexpan (piecewise expansion growth) and plog (piecewise logistic growth). Genifit allows users to infer parameters through c++ code, for example,

library(dfoptim)
fit2=geniefit(village,Model="log",start=c(10,.1,.1),upper=Inf,lower=0,Rcpp=T)
fit2

The AIC suggests the logistic model is a better fit. Here we consider a benchmarking comparison between using c++ code and R code when considering the same logistic growth model.

library(rbenchmark)

benchmark("Rcode" = {
  geniefit(village,Model="log",start=c(10,.1,.1),upper=Inf,lower=0)

},
"Rcpp" = {
  geniefit(village,Model="log",start=c(10,.1,.1),upper=Inf,lower=0,Rcpp=T)
},
replications = 100,
columns = c("test", "replications", "elapsed",
            "relative", "user.self", "sys.self"))

The c++ code performs better in this case. Meanwhile, the function can also produce a MCMC fit to a coalescent model.

fit3=geniefit(village,Model="log",start=c(10,.1,.1),upper=Inf,lower=0,MCMC=T)
fit3[1:3]

Obviousely, The MCMC fit is better as it provides a better loglikelihood under the logistic model.

This package can also simulates coalescent times for isochronous/heterochronous data.

 trajectory<-function(x)  exp(10*x)
 sample1<-cbind(c(9,1,2,1),c(0,.008,.03,.1))
 example.hetero<-coalgen.hetero(sample1, trajectory)
 example.hetero
 sample<-c(10,0)
 example.iso<-coalgen.iso(sample, trajectory)
 example.iso


xiangfstats/GenieR documentation built on May 4, 2019, 1:06 p.m.