In order to simulate data we must first create a tree. Note that the tree is unrooted.

library(epiAllele)
library(ape)

set.seed(123)
### Create a tree
tr <- unroot(rtree(n = 10, br = runif))
plot(tr)
nodelabels()
tiplabels()

Say we want to simulate data where all edges below node 16 have rate 3 and all above have rate 1. To do so we construct a vector of rates with a one-to-one mapping to tree edges. We can then simulate data with the simData function whcih returns a list containing the tree, data, the stationary character distribution, an edge table, and a table that maps edgeGroups to rates.

rates=rep(0.1,length(tr$edge.length))
rates[tr$edge[,2] %in% c(7:10,17:18)]=0.3
twoRateData=simData(nSites = 100,tr=tr,rate = rates,pi = c(0.3,0.7))

Now we can place the data in an alleleData object and then a rateModel. We can check that the rate assignments are correct by plotting the rate model object.

ad=alleleData(data = twoRateData$data,tree = tr)
twoRateMod=rateModel(ad,lineageTable = twoRateData$edgeTable)
plotTree(twoRateMod)

Now lets fit the model.

fittedTRM=fit(twoRateMod)
plotTree(fittedTRM$model,"value")

If this was real data, we might be interested as to whether we really need two rates of turnover to model the observed site patterns. In that case we would construct and fit a null model by copying our two rate model and tying the rates on the two subtrees as follows:

  singleRateModel=tieRates(obj = twoRateMod,siteLabelA = "All",edgeGroupA = "e1",siteLabelB = "All",edgeGroupB = "e2")
  plotTree(singleRateModel)

As we can see above, all edges of the tree now point to a single rate. Now we fit our single rate model and compare to our two rate model using either a likelihood ratio test (LRT) for nested models, or the bayesian information criterion (BIC) for non-nested (or nested) models.

fittedSRM=fit(singleRateModel)
deltaBIC=bic(fittedSRM$model)-bic(fittedTRM$model)
print(deltaBIC) ## Bigger BIC values support rejecting the null model
lrt(h0=fittedSRM$model,hA=fittedTRM$model) ## Smaller p-values support rejecting the null model

If so desired, we can also extract approximate standard errors for the rate parameters and plot the results. If the bars for the rates do not overlap this can be approximately interperted as a rejection of the null hypothesis the the rates are the same across two different groups of elements. However, since the standard errors are computed uisng the hessian they may be imprecise, so conducting a rigourous LRT as shown above is preferred.

library(ggplot2)
trmSE=rateSE(fittedTRM$model)
ggplot(trmSE,aes(x=siteLabel,fill=edgeGroup,y=value))+
  geom_bar(stat = "identity",position="dodge")+
  geom_errorbar(aes(ymin=value-se, ymax=value+se), width=.2,position=position_dodge(.9))+
  theme_bw()+
  ylab("Rate")


ndukler/epiAllele documentation built on May 5, 2019, 4:50 p.m.