getLikGeneCount: Negative log-likelihood of gene count data

Description Usage Arguments Details Value References See Also Examples

Description

Calculates the overall negative log-likelihood of gene count data on a phylogenetic tree under a birth-and-death process and whole genome duplication events.

Usage

1
2
3
4
5
getLikGeneCount(para, input, geneCountData, mMax=NULL,
                geomProb=NULL, dirac=NULL, useRootStateMLE=FALSE,
                conditioning=c("oneOrMore", "twoOrMore",
                "oneInBothClades", "none"),
                equalBDrates=FALSE, fixedRetentionRates=TRUE)

Arguments

para

vector of parameters (see Details)

input

object output by function processInput

geneCountData

data frame with one column for each species and one row for each family, containing the number of gene copies in each species for each gene family. The column names must match the species names in the tree.

mMax

maximum number of surviving lineages at the root, at which the likelihood will be evaluated.

geomProb

inverse of the prior mean number of gene lineages at the root.

dirac

value for the number of genes at the root, when this is assumed to have a fixed value (according to a dirac prior distribution).

useRootStateMLE

if TRUE, the most likely number of genes at the root is determined for each family separately and is used to evaluate the likelihood function.

conditioning

type of conditioning for the likelihood calculation. The default is to calculate conditional probabilities on observing families with at least 1 gene copy (see Details in MLEGeneCount).

equalBDrates

if TRUE, the duplication and loss rates are equal.

fixedRetentionRates

if TRUE, it uses retention rates present in input$wgdTab. If FALSE, it uses retention rates in para.

Details

The vector para for the parameters to be used is of size 1+number of WGD/Ts if the birth and death rates are assumed equal, or 2+number of WGD/Ts otherwise. It starts with log(StartingBDrates[1]) if equalBDrates is TRUE, with log(StartingBDrates) otherwise. The remaining components correspond to retention rates.

Value

negative log-likelihood value

References

Csuros M and Miklos I (2009). Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. Molecular Biology and Evolution. 26:2087-2095.

Charles-Elie Rabier, Tram Ta and Cécile Ané (2013). Detecting and locating whole genome duplications on a phylogeny: a probabilistic approach. Molecular Biology and Evolution. 31(3):750-762.

See Also

MLEGeneCount, logLik_CsurosMiklos.

Examples

1
2
3
4
5
6
7
tre.string = "(D:{0,18.03},(C:{0,12.06},(B:{0,7.06},
              A:{0,7.06}):{0,2.49:wgd,0:0,2.50}):{0, 5.97});"
tre.phylo4d = read.simmap(text=tre.string)
dat = data.frame(A=c(2,2,3,1), B=c(3,0,2,1), C=c(1,0,2,2), D=c(2,1,1,1));
a = processInput(tre.phylo4d, startingQ=0.9)
getLikGeneCount(log(c(.01,.02)),a,dat,mMax=8,geomProb=1/1.5,
                conditioning="oneOrMore")

cecileane/WGDgc documentation built on Aug. 6, 2020, 12:09 p.m.