processInput: Preprocessing function

Description Usage Arguments Details Value Examples

View source: R/processInput.R

Description

Checking arguments and preparing data for future optimization

Usage

1
2
processInput(tr, equalBDrates=FALSE, fixedRetentionRates=TRUE,
             startingBDrates=c(0.01, 0.02),startingQ=NULL)

Arguments

tr

a species tree in SIMMAP format (see Details of function MLEGeneCount).

equalBDrates

if TRUE, the duplication and loss rates are equal.

fixedRetentionRates

if TRUE, retention rates will be fixed to startingQ during the future optimization. If FALSE, retention rates will be considered as parameters and will be estimated by maximum likelihood.

startingBDrates

Vector of size 2 as starting values for the duplication and loss rates. When equalBDrates=TRUE only the first component is used.

startingQ

Vector of starting values for retention rates. Default is 0.5 for all WGD events.

Details

The vector para of starting values for the parameters to be optimized is of size 1+number of WGDs if the birth and death rates are assumed equal, or 2+number of WGDs otherwise. It starts with log(StartingBDrates[1]) if equalBDrates is TRUE, with log(StartingBDrates) otherwise, and the remaining components (corresponding to the retention rates) are startingQ if startingQ is provided, 0.5 otherwise.

For WGT events, the 2 extra copies are assumed to be retained independently. With retention rate q, the probability to retain all 3 gene copies is then q^2, the probability to retain 2 gene copies is 2*q*(1-q), and the probability to retain the original gene only is (1-q)^2.

lower and upper are vectors whose sizes correspond to the number of parameters for the lower and upper bounds of the different parameters in a subsequent optimization search. The log of the duplication and loss rates are unconstrained, while duplicate retention rates are constrained in [0,1].

Value

phyloMat

data frame to represent the phylogeny. The number of rows is the number of nodes in the species tree. There are 5 columns (Parent, Child, Time, Species, type).

nLeaf

number of present-day species (i.e. number of leaves)

nNode

number of nodes in the species tree

wgdTab

data frame with 5 columns. Each row corresponds to a WGD(s) or WGT(s). The first column gives the node just before the WGD/T. The second column 'type' says if the event is a WGD or WGT. The remaining columns contain the probabilities that only the original gene is retained, or if 2 (or 3) gene copies are retained.

para

Vector of parameters to be optimized. see Details

lower

Lower bounds for later optimization. see Details

upper

Upper bounds for later optimization. see Details

Examples

1
2
3
4
tre.string = "(D:{0,18.03},(C:{0,12.06},(B:{0,7.06},
              A:{0,7.06}):{0,2.49:wgd,0:0,2.50}):{0, 5.97});"
tre.phylo4d = read.simmap(text=tre.string)
processInput(tre.phylo4d)

cecileane/WGDgc documentation built on Aug. 6, 2020, 12:09 p.m.