paml.baseml: Phylogenetic Analysis by Maximum Likelihood for Nucleotide...

View source: R/p_paml_baseml.r

paml.basemlR Documentation

Phylogenetic Analysis by Maximum Likelihood for Nucleotide Sequences

Description

This function modifies the original standalone code of baseml in PAML developed by Yang (1997) for phylogenetic analysis by maximum likelihood. This function provides a way to generate an ancestral tree for given central sequences clustered by phyclust.

Usage

paml.baseml(X, seqname = NULL, opts = NULL, newick.trees = NULL)
paml.baseml.control(...)
paml.baseml.show.default()

Arguments

X

nid matrix with N rows/sequences and L columns/sites.

seqname

sequence names.

opts

options as the standalone version, provided by paml.baseml.contol.

newick.trees

a vector/list contains NEWICK trees for runmode = 2.

...

for other possible opts and values. See PAML manual for details.

show

show opts and values.

Details

The function paml.baseml directly reuses the C code of baseml of PAML, and the function paml.baseml.control is to generate controls for paml.baseml as the file baseml.ctl of PAML.

The seqname should be consistent with X, and the leaf nodes of newick.trees.

The options opts is followed from the original baseml.ctl except seqfile, treefile and outputfile will be omitted.

paml.baseml.control generates default opts, and paml.baseml.show.default displays annotations for the default opts.

Value

This function returns a list, and each element stores one line of outputs of baseml separated by newline. The list stores in a class baseml. All the output of baseml of PAML will be saved in several files, and these will be read in by scan. Further post processing can be done by parsing the returning vector. The details of output format can found on the website http://abacus.gene.ucl.ac.uk/software/paml.html and its manual.

Note that some functionalities of baseml of PAML are changed in paml.baseml due to the complexity of input and output. The changes include such as disable the option G and rename the file 2base.t to pairbase.t.

Typically, the list contains the original output of baseml including pairbase.t, mlb, rst, rst1, and rub if they are not empty. The best tree (unrooted by default) will be stored in best.tree parsed from mlb based on the highest log likelihood. All output to STDOUT are stored in stdout. No STDIN are allowed.

Note that the print function for the class baseml will only show the best.tree only. Use str or names to see the whole returns of the list.

Warning(s)

Carefully read the PAML's original document before using the paml.baseml function, and paml.baseml may not be ported well from baseml of PAML. Please double check with the standalone version.

baseml may not be a well designed program, and may run slowly. If it were stuck, temporary files would all store at a directory obtained by tempfile("/paml.baseml.").

baseml has its own options and settings which may be different than phyclust and ape. For example, the following is from the PAML's document, “In PAML, a rooted tree has a bifurcation at the root, while an unrooted tree has a trifurcation or multifurcation at the root.” i.e. paml.baseml uses a rooted result for an unrooted tree, as well as for a rooted tree.

baseml also needs a sequence file which is dumped from R (duplicated from memory) for paml.baseml, and this file can be very big if sequences are too long or number of sequences is too large. Also, paml.baseml may take long time to search the best tree if data are large or initial trees are not provided.

Author(s)

Yang, Z. (1997) and Yang, Z. (2007)

Maintain: Wei-Chen Chen wccsnow@gmail.com

References

Phylogenetic Clustering Website: https://snoweye.github.io/phyclust/

Yang, Z. (1997) “PAML: a program package for phylogenetic analysis by maximum likelihood”, Computer Applications in BioSciences, 13, 555-556.

Yang, Z. (2007) “PAML 4: a program package for phylogenetic analysis by maximum likelihood”, Molecular Biology and Evolution, 24, 1586-1591. http://abacus.gene.ucl.ac.uk/software/paml.html

See Also

print.baseml, write.paml.

Examples

## Not run: 
library(phyclust, quiet = TRUE)

paml.baseml.show.default()

### Generate data.
set.seed(123)
ret.ms <- ms(nsam = 5, nreps = 1, opts = "-T")
ret.seqgen <- seqgen(opts = "-mHKY -l40 -s0.2", newick.tree = ret.ms[3])
(ret.nucleotide <- read.seqgen(ret.seqgen))
X <- ret.nucleotide$org
seqname <- ret.nucleotide$seqname

### Run baseml.
opts <- paml.baseml.control(model = 4, clock = 1)
(ret.baseml <- paml.baseml(X, seqname = seqname, opts = opts))
(ret.baseml.init <- paml.baseml(X, seqname = seqname, opts = opts,
   newick.trees = ret.ms[3]))
ret.ms[3]

### Unrooted tree.
opts <- paml.baseml.control(model = 4)
(ret.baseml.unrooted <- paml.baseml(X, seqname = seqname, opts = opts))

### More information.
opts <- paml.baseml.control(noisy = 3, verbose = 1, model = 4, clock = 1)
ret.more <- paml.baseml(X, seqname = seqname, opts = opts)
# ret.more$stdout

### Plot trees
par(mfrow = c(2, 2))
plot(read.tree(text = ret.ms[3]), main = "true")
plot(read.tree(text = ret.baseml$best.tree), main = "baseml")
plot(read.tree(text = ret.baseml.init$best.tree), main = "baseml with initial")
plot(unroot(read.tree(text = ret.baseml.unrooted$best.tree)),
     main = "baseml unrooted")

## End(Not run)

phyclust documentation built on Sept. 8, 2023, 6 p.m.