SelacOptimize: Efficient optimization of the SELAC model
In selac: Selection Models for Amino Acid and/or Codon Evolution

Description Usage Arguments Details Examples

Efficient optimization of model parameters under the SELAC model

SelacOptimize(codon.data.path, n.partitions = NULL, phy,
  data.type = "codon", codon.model = "selac",
  edge.length = "optimize", edge.linked = TRUE,
  optimal.aa = "optimize", nuc.model = "GTR", include.gamma = FALSE,
  gamma.type = "quadrature", ncats = 4, numcode = 1,
  diploid = TRUE, k.levels = 0, aa.properties = NULL,
  verbose = FALSE, n.cores.by.gene = 1, n.cores.by.gene.by.site = 1,
  max.tol = 0.001, max.tol.edges = 0.001, max.evals = 1e+06,
  max.restarts = 3, user.optimal.aa = NULL,
  fasta.rows.to.keep = NULL, recalculate.starting.brlen = TRUE,
  output.by.restart = TRUE, output.restart.filename = "restartResult",
  user.supplied.starting.param.vals = NULL, tol.step = 1,
  optimizer.algorithm = "NLOPT_LN_SBPLX", start.from.mle = FALSE,
  mle.matrix = NULL, partition.order = NULL, max.iterations = 6,
  dt.threads = 1)

`codon.data.path`	Provides the path to the directory containing the gene specific fasta files of coding data. Must have a ".fasta" line ending.
`n.partitions`	The number of partitions to analyze. The order is based on the Unix order of the fasta files in the directory.
`phy`	The phylogenetic tree to optimize the model parameters.
`data.type`	The data type being tested. Options are "codon" or "nucleotide".
`codon.model`	The type of codon model to use. There are four options: "none", "GY94", "YN98", "FMutSel0", "FMutSel", "selac".
`edge.length`	Indicates whether or not edge lengths should be optimized. By default it is set to "optimize", other option is "fixed", which is the user-supplied branch lengths.
`edge.linked`	A logical indicating whether or not edge lengths should be optimized separately for each gene. By default, a single set of each lengths is optimized for all genes.
`optimal.aa`	Indicates what type of optimal.aa should be used. There are five options: "none", "majrule", "averaged, "optimize", or "user".
`nuc.model`	Indicates what type nucleotide model to use. There are three options: "JC", "GTR", or "UNREST".
`include.gamma`	A logical indicating whether or not to include a discrete gamma model.
`gamma.type`	Indicates what type of gamma distribution to use. Options are "quadrature" after the Laguerre quadrature approach of Felsenstein 2001 or median approach of Yang 1994 or "lognormal" after a lognormal quadrature approach.
`ncats`	The number of discrete categories.
`numcode`	The ncbi genetic code number for translation. By default the standard (numcode=1) genetic code is used.
`diploid`	A logical indicating whether or not the organism is diploid or not.
`k.levels`	Provides how many levels in the polynomial. By default we assume a single level (i.e., linear).
`aa.properties`	User-supplied amino acid distance properties. By default we assume Grantham (1974) properties.
`verbose`	Logical indicating whether each iteration be printed to the screen.
`n.cores.by.gene`	The number of cores to dedicate to parallelize analyses across gene.
`n.cores.by.gene.by.site`	The number of cores to decidate to parallelize analyses by site WITHIN a gene. Note n.cores.by.gene*n.cores.by.gene.by.site is the total number of cores dedicated to the analysis.
`max.tol`	Supplies the relative optimization tolerance.
`max.tol.edges`	Supplies the relative optimization tolerance for branch lengths only. Default is that is the same as the max.tol.
`max.evals`	Supplies the max number of iterations tried during optimization.
`max.restarts`	Supplies the number of random restarts.
`user.optimal.aa`	If optimal.aa is set to "user", this option allows for the user-input optimal amino acids. Must be a list. To get the proper order of the partitions see "GetPartitionOrder" documentation.
`fasta.rows.to.keep`	Indicates which rows to remove in the input fasta files.
`recalculate.starting.brlen`	Whether to use given branch lengths in the starting tree or recalculate them.
`output.by.restart`	Logical indicating whether or not each restart is saved to a file. Default is TRUE.
`output.restart.filename`	Designates the file name for each random restart.
`user.supplied.starting.param.vals`	Designates user-supplied starting values for C.q.phi.Ne, Grantham alpha, and Grantham beta. Default is NULL.
`tol.step`	If > 1, makes for coarser tolerance at earlier iterations of the optimizer
`optimizer.algorithm`	The optimizer used by nloptr.
`start.from.mle`	If TRUE, will start optimization from the MLE. Default is FALSE.
`mle.matrix`	The user-supplied matrix of parameter values for when start.from.mle is set to TRUE.
`partition.order`	Allows for a specialized order of the partitions to be gathered from the working directory.
`max.iterations`	Sets the number of cycles to optimize the different parts of the model.
`dt.threads`	Indicates how many available threads to allow data.table to use. Default is zero.

Here we optimize parameters across each gene separately while keeping the shared parameters, alpha, beta, edge lengths, and nucleotide substitution parameters constant across genes. We then optimize alpha, beta, gtr, and the edge lengths while keeping the rest of the parameters for each gene fixed. This approach is potentially more efficient than simply optimizing all parameters simultaneously, especially if fitting models across 100's of genes.

## Not run: 
phy <- ape::read.tree(file=system.file("extdata", "rokasYeast.tre", package="selac"))
result <- SelacOptimize(codon.data.path = paste0(find.package("selac"), '/extdata/'),
n.partitions=1, phy=phy, max.evals=10)
print(result)

## End(Not run)