Fit a Phylogenetic model to an alignment...
Description
Fit a Phylogenetic model to an alignment
Usage
1 2 3 4 5 6 7  phyloFit(msa, tree = NULL, subst.mod = "REV", init.mod = NULL,
no.opt = c("backgd"), init.backgd.from.data = ifelse(is.null(init.mod),
TRUE, FALSE), features = NULL, scale.only = FALSE, scale.subtree = NULL,
nrates = NULL, alpha = 1, rate.constants = NULL, selection = NULL,
init.random = FALSE, init.parsimony = FALSE, clock = FALSE,
EM = FALSE, max.EM.its = NULL, precision = "HIGH", ninf.sites = 50,
quiet = FALSE, bound = NULL, log.file = FALSE)

Arguments
msa 
An alignment object. May be altered if passed in as a pointer to C memory (see Note). 
tree 
A character string containing a Newick formatted tree defining the topology. Required if the number of species > 3, unless init.mod is specified. The topology must be rooted, although the root is ignored if the substitution model is reversible. 
subst.mod 
The substitution model to use. Some possible models
include "REV", "JC69", "K80", "F81", "HKY85", "R2", "U2". Run

init.mod 
An object of class 
no.opt 
A character vector indicating which parameters NOT to optimize
(instead hold constant at their initial values). By default, the
equilibrium frequencies (backgd) are not optimized. Other parameters that
may be indicated here are "ratematrix" for the entire rate matrix, "kappa"
for models with transition/transversion ratios, "branches" to hold all
branch lengths constant, "ratevar" for rate variation parameters, "scale"
for the tree scaling factor, and "scale_sub" for the subtree scaling factor.
This argument does NOT apply to parameters of a lineagespecific model
created with 
init.backgd.from.data 
A logical value; can be 
features 
An object of type 
scale.only 
A logical value. If 
scale.subtree 
A character string giving the name of a node in a tree. This option implies scale.only=TRUE. If given, estimate separate scale factors for subtree beneath identified node and the rest of the tree. The branch leading to the subtree is included in the subtree. 
nrates 
An integer. The number of rate categories to use. Specifying
a value greater than one causes the discrete gamma model for rate variation
to be used, unless rate constants are specified. The default value

alpha 
A numeric value > 0, for use with "nrates". Initial value for alpha, the shape parameter of the gamma distribution. 
rate.constants 
A numeric vector. Implies 
selection 
A numeric value. If provided, use selection in the model.
The value given will be the initial value for selection. If 
init.random 
A logical value. If 
init.parsimony 
A logical value. If 
clock 
A logical value. If 
EM 
A logical value. If 
max.EM.its 
An integer value; only applies if 
precision 
A character vector, one of "HIGH", "MED", or "LOW", denoting the level of precision to use in estimating model parameters. Affects convergence criteria for iterative algorithms: higher precision means more iterations and longer execution time. 
ninf.sites 
An integer. Require at least this many "informative" sites in order to estimate a model. An informative site as an alignment column with at least two nongap and nonmissingdata characers. 
quiet 
A logical value. If 
bound 
Defines boundaries for parameters (see Details below). 
log.file 
If TRUE, write log of optimization to the screen. If a character string, write log of optimization to the named file. Otherwise write no optimization log. 
Value
An object of class tm
(tree model), or (if several models are
computed, as is possible with the features or windows options), a list of
objects of class tm
.
Parameter boundaries
Boundaries can be set for some parameters
using the bound argument. The bound argument should be a vector of
character strings, each element defines the boundaries for a single
parameter. The boundaries are best explained by example. A value of
c("scale[0,1]", "scale_sub[1,]", "kappa[,3]")
would imply to keep the
scale between 0 and 1, the subtree scale between 1 and infinity, and kappa
between 0 and 3. The blank entries in the subtree_scale upper bound and
kappa's lower bound indicate not to set this boundary, in which case the
normal default boundary will be used for that parameter. (Most parameters
are defined between 0 and infinity). Most of the parameters listed in the
description of no.opt can also have their boundaries set in this way.
Note
If msa or features object are passed in as pointers to C memory, they
may be altered by this function! Use copy.msa(msa)
or
copy.feat(features)
to avoid this behavior!
Author(s)
Melissa J. Hubisz and Adam Siepel
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  exampleArchive < system.file("extdata", "examples.zip", package="rphast")
files < c("ENr334100k.maf", "ENr334100k.fa", "gencode.ENr334100k.gff", "rev.mod")
unzip(exampleArchive, files)
m < read.msa("ENr334100k.maf")
mod < phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)")
mod
phyloFit(m, init.mod=mod)
likelihood.msa(m, mod)
mod$likelihood
print(mod$likelihood, digits=10)
f < read.feat("gencode.ENr334100k.gff")
mod < phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)",
features=f, quiet=TRUE)
names(mod)
mod$other
mod[["5'flank"]]
phyloFit(m, init.mod=mod$AR, nrates=3, alpha=4.0)
phyloFit(m, init.mod=mod$AR, rate.constants=c(10, 5, 1))
# background frequencies options
# this should use the background frequencies from the initial mod
phyloFit(m, init.mod=mod$AR, quiet=TRUE)$backgd
mod$AR$backgd
# this should use the background frequencies from the data
phyloFit(m, init.mod=mod$AR, init.backgd.from.data=TRUE, quiet=TRUE)$backgd
mod$AR$backgd
# this should optimize the background frequencies
phyloFit(m, init.mod=mod$AR, no.opt=NULL, quiet=TRUE)$backgd
mod$AR$backgd
unlink(files)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.