Description Usage Arguments Value Parameter boundaries Note Author(s) Examples
Fit a Phylogenetic model to an alignment
1 2 3 4 5 6 7 | phyloFit(msa, tree = NULL, subst.mod = "REV", init.mod = NULL,
no.opt = c("backgd"), init.backgd.from.data = ifelse(is.null(init.mod),
TRUE, FALSE), features = NULL, scale.only = FALSE, scale.subtree = NULL,
nrates = NULL, alpha = 1, rate.constants = NULL, selection = NULL,
init.random = FALSE, init.parsimony = FALSE, clock = FALSE,
EM = FALSE, max.EM.its = NULL, precision = "HIGH", ninf.sites = 50,
quiet = FALSE, bound = NULL, log.file = FALSE)
|
msa |
An alignment object. May be altered if passed in as a pointer to C memory (see Note). |
tree |
A character string containing a Newick formatted tree defining the topology. Required if the number of species > 3, unless init.mod is specified. The topology must be rooted, although the root is ignored if the substitution model is reversible. |
subst.mod |
The substitution model to use. Some possible models
include "REV", "JC69", "K80", "F81", "HKY85", "R2", "U2". Run
|
init.mod |
An object of class |
no.opt |
A character vector indicating which parameters NOT to optimize
(instead hold constant at their initial values). By default, the
equilibrium frequencies (backgd) are not optimized. Other parameters that
may be indicated here are "ratematrix" for the entire rate matrix, "kappa"
for models with transition/transversion ratios, "branches" to hold all
branch lengths constant, "ratevar" for rate variation parameters, "scale"
for the tree scaling factor, and "scale_sub" for the subtree scaling factor.
This argument does NOT apply to parameters of a lineage-specific model
created with |
init.backgd.from.data |
A logical value; can be |
features |
An object of type |
scale.only |
A logical value. If |
scale.subtree |
A character string giving the name of a node in a tree. This option implies scale.only=TRUE. If given, estimate separate scale factors for subtree beneath identified node and the rest of the tree. The branch leading to the subtree is included in the subtree. |
nrates |
An integer. The number of rate categories to use. Specifying
a value greater than one causes the discrete gamma model for rate variation
to be used, unless rate constants are specified. The default value
|
alpha |
A numeric value > 0, for use with "nrates". Initial value for alpha, the shape parameter of the gamma distribution. |
rate.constants |
A numeric vector. Implies |
selection |
A numeric value. If provided, use selection in the model.
The value given will be the initial value for selection. If |
init.random |
A logical value. If |
init.parsimony |
A logical value. If |
clock |
A logical value. If |
EM |
A logical value. If |
max.EM.its |
An integer value; only applies if |
precision |
A character vector, one of "HIGH", "MED", or "LOW", denoting the level of precision to use in estimating model parameters. Affects convergence criteria for iterative algorithms: higher precision means more iterations and longer execution time. |
ninf.sites |
An integer. Require at least this many "informative" sites in order to estimate a model. An informative site as an alignment column with at least two non-gap and non-missing-data characers. |
quiet |
A logical value. If |
bound |
Defines boundaries for parameters (see Details below). |
log.file |
If TRUE, write log of optimization to the screen. If a character string, write log of optimization to the named file. Otherwise write no optimization log. |
An object of class tm
(tree model), or (if several models are
computed, as is possible with the features or windows options), a list of
objects of class tm
.
Boundaries can be set for some parameters
using the bound argument. The bound argument should be a vector of
character strings, each element defines the boundaries for a single
parameter. The boundaries are best explained by example. A value of
c("scale[0,1]", "scale_sub[1,]", "kappa[,3]")
would imply to keep the
scale between 0 and 1, the subtree scale between 1 and infinity, and kappa
between 0 and 3. The blank entries in the subtree_scale upper bound and
kappa's lower bound indicate not to set this boundary, in which case the
normal default boundary will be used for that parameter. (Most parameters
are defined between 0 and infinity). Most of the parameters listed in the
description of no.opt can also have their boundaries set in this way.
If msa or features object are passed in as pointers to C memory, they
may be altered by this function! Use copy.msa(msa)
or
copy.feat(features)
to avoid this behavior!
Melissa J. Hubisz and Adam Siepel
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | exampleArchive <- system.file("extdata", "examples.zip", package="rphast")
files <- c("ENr334-test.maf", "ENr334-100k.fa", "ENr334-test.gff", "rev.mod")
unzip(exampleArchive, files)
m <- read.msa("ENr334-test.maf")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)")
mod
phyloFit(m, init.mod=mod)
likelihood.msa(m, mod)
mod$likelihood
print(mod$likelihood, digits=10)
f <- read.feat("ENr334-test.gff")
mod <- phyloFit(m, tree="((hg18, (mm9, rn4)), canFam2)",
features=f, quiet=TRUE)
names(mod)
mod$other
mod[["5'flank"]]
phyloFit(m, init.mod=mod$AR, nrates=3, alpha=4.0)
phyloFit(m, init.mod=mod$AR, rate.constants=c(10, 5, 1))
# background frequencies options
# this should use the background frequencies from the initial mod
phyloFit(m, init.mod=mod$AR, quiet=TRUE)$backgd
mod$AR$backgd
# this should use the background frequencies from the data
phyloFit(m, init.mod=mod$AR, init.backgd.from.data=TRUE, quiet=TRUE)$backgd
mod$AR$backgd
# this should optimize the background frequencies
phyloFit(m, init.mod=mod$AR, no.opt=NULL, quiet=TRUE)$backgd
mod$AR$backgd
unlink(files)
|
Extracting sufficient statistics ...
Compacting sufficient statistics ...
Fitting tree model to alignment using REV ...
numpar = 11
Done. log(likelihood) = -149094.320182 numeval=331
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
TRAINING_LNL: -149094.320182
BACKGROUND: 0.269027 0.248505 0.227938 0.254530
RATE_MAT:
-0.942590 0.176447 0.624375 0.141767
0.191019 -0.996540 0.151541 0.653980
0.736926 0.165214 -1.105011 0.202871
0.149842 0.638500 0.181676 -0.970018
TREE: ((hg18:0.112696,(mm9:0.0657572,rn4:0.0698817):0.292789):0.105601,canFam2:0.105601);
Extracting sufficient statistics ...
Compacting sufficient statistics ...
Fitting tree model to alignment using REV ...
numpar = 11
Done. log(likelihood) = -149094.320181 numeval=29
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
TRAINING_LNL: -149094.320181
BACKGROUND: 0.269027 0.248505 0.227938 0.254530
RATE_MAT:
-0.942590 0.176447 0.624375 0.141768
0.191019 -0.996540 0.151541 0.653980
0.736926 0.165214 -1.105011 0.202871
0.149842 0.638500 0.181676 -0.970018
TREE: ((hg18:0.112696,(mm9:0.0657604,rn4:0.0698818):0.29279):0.105599,canFam2:0.105599);
[1] -149094.3
[1] -149094.3
[1] -149094.3202
Skipping alignment (category 7); insufficient informative sites ...
[1] "background" "AR" "other" "5'flank" "5'UTR"
[6] "CDS" "intron" "3'flank" "phastCons"
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
TRAINING_LNL: -75146.750332
BACKGROUND: 0.254259 0.252749 0.251373 0.241619
RATE_MAT:
-1.001616 0.188908 0.669705 0.143004
0.190036 -0.969975 0.153607 0.626332
0.677394 0.154448 -1.023594 0.191752
0.150484 0.655184 0.199493 -1.005161
TREE: ((hg18:0.118571,(mm9:0.0677486,rn4:0.0729959):0.347226):0.112422,canFam2:0.112422);
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
TRAINING_LNL: -408.370000
BACKGROUND: 0.249330 0.276139 0.321716 0.152815
RATE_MAT:
-1.024082 0.194429 0.763515 0.066138
0.175552 -0.988782 0.074163 0.739067
0.591724 0.063656 -0.717861 0.062481
0.107909 1.335508 0.131538 -1.574955
TREE: ((hg18:1.38418e-16,(mm9:0.148352,rn4:0.142487):0.148352):0.188016,canFam2:0.188016);
Extracting sufficient statistics ...
Compacting sufficient statistics ...
Fitting tree model to alignment using REV (with rate variation) ...
numpar = 14
Done. log(likelihood) = 593439.111981 numeval=1591
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
NRATECATS: 3
RATE_CONSTS: 1.000000 0.000000 0.000000
RATE_WEIGHTS: 0.374809 0.311231 0.313960
TRAINING_LNL: 593439.111981
BACKGROUND: 0.328209 0.212004 0.190436 0.269352
RATE_MAT:
-1.440369 0.176546 0.425040 0.838783
0.273316 -0.353648 0.045212 0.035120
0.732539 0.050333 -0.836580 0.053707
1.022068 0.027643 0.037972 -1.087682
TREE: ((hg18:1.30764e+06,(mm9:3.89557e-13,rn4:635199):214870):813996,canFam2:813996);
Extracting sufficient statistics ...
Compacting sufficient statistics ...
Fitting tree model to alignment using REV (with rate variation) ...
ALPHABET: A C G T
ORDER: 0
SUBST_MOD: REV
NRATECATS: 3
RATE_CONSTS: 10.000000 5.000000 1.000000
RATE_WEIGHTS: -0.000246 0.030657 0.969589
TRAINING_LNL: -149906.317945
BACKGROUND: 0.328209 0.212004 0.190436 0.269352
RATE_MAT:
-0.800149 0.144451 0.523022 0.132676
0.223629 -1.135865 0.156648 0.755589
0.901408 0.174389 -1.306774 0.230977
0.161667 0.594717 0.163304 -0.919688
TREE: ((hg18:0.114939,(mm9:0.0654673,rn4:0.0691082):0.298064):0.10845,canFam2:0.10845);
[1] 0.3282085 0.2120040 0.1904358 0.2693517
[1] 0.3282085 0.2120040 0.1904358 0.2693517
[1] 0.2690269 0.2485048 0.2279384 0.2545299
[1] 0.3282085 0.2120040 0.1904358 0.2693517
[1] 0.2762342 0.2419623 0.2250745 0.2567290
[1] 0.3282085 0.2120040 0.1904358 0.2693517
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.