phylen: Compute Core Genome Alignment And Phylogeny

Description Usage Arguments Details Value Author(s) References

Description

Identify and align core genes, concatenate the alignments in a single file, and use it to compute a phylogeny.

Usage

1
2
3
4
phylen(gffs = character(), hmmFile = character(), isCompressed = TRUE,
  eval = 1e-30, level, phyloMode = "ml", nbs = 100L, outDir = "phylen",
  aliPfx = "supergene", treePfx = "phylo", mafftMode = "linsi",
  keepOgs = FALSE, n_threads = 1L, ...)

Arguments

gffs

A character vector with the gff file paths.

hmmFile

The path to the .hmm.tar.gz file downloaded from EggNOG website or using list_eggnogdb and download_nog_hmm functions on this package. The already prepared hmm text file can also be provided (see isCompressed below). Alternatively a custom set of hmm files can be passed as a concatenated single file.

isCompressed

logical() If the hmm param points to the "hmm.tar.gz" file downloaded from EggNOG, it should be set to TRUE. If the pipeline has been already ran or a custom set of hmm is used, the hmm parameter should point to the ".hmm" file, and the isCompressed param should be set to FALSE. If the pipeline was ran before, the function will also check for index files and will produce them if any of the required is missing. See "hmmpress" from HMMER 3.1b2 manual.

eval

Consider hits with an evalue less than eval.

level

numeric The percentage of isolates a gene must be in to be considered part of the coregenome. If nothing is specified, a plot is generated at the middle of the process showing number of core genes vs a percentage (from 100 to 85 in that range. The process wont continue until the user choose a level.

phyloMode

One of "nj" (Neighbour-joining) of "ml" (Maximum likelihood).

nbs

Number of bootstrap. If phyloMode is set to "nj", this parameter is ignored. If phyloMode is set to "ml", and nbs is set to 0, no bootstrap is performed. If phyloMode = "ml", and nbs>0, then bootstrap is performed and 2 newick files are generated, one with the ML optimized tree (subffix "_ml.nwk"), and another with the bootstrap trees (subffix "_ml_nbs.nwk").

outDir

Where to put the output files. If outDir do not exists, then a directory with the specified name is created.

aliPfx

A character string with the coregenome alignment file prefix. (Default: supergene).

treePfx

A character string with the newick trees files prefixes. (Default: phylo).

mafftMode

Alignment accuracy. One of "mafft", "ginsi", "linsi" or "einsi". The first one is the default MAFFT mode, very fast. The second uses mafft options "–maxiterate 1000 –globalpair". The third uses "–maxiterate 1000 –localpair" (phylen DEFAULT). The fourth uses "–ep 0 –maxiterate 1000 –genafpair". See MAFFT manual for more details.

keepOgs

logical() If want to keep an intermediate directory fasta files containing the orthologous groups (DEFAULT: FALSE). If TRUE, then a directory called "orthogroups" is kept inside the outDir directory.

n_threads

integer The number of cpus to use.

...

Further arguments to pass to optim.pml.

Details

This function takes gff files as returned by prokka (Seemann T, 2014) and a set of hmm models, search the models in the genomes, identifies the "core" set of genes, align and concatenates them into a "super gene" alignemnt. Once this alignment is built, a phylogeny is inferred.

HMMER 3.1b2 is used as search engine, and MAFFT aligner is used to align the orthologous groups. Both software must be installed before running this pipeline.

Note: mafft aliases ginsi, linsi, and einsi (see above) must be also installed. This shortcuts are by default installed together with mafft if you download and install the software from the MAFFT webpage, but probably not if use a package manager as apt or brew.

phangorn package is used to perform the phylogenetic inference.

Value

A core genome alignment file, a phylogenetic tree in newick format (or two, see nbs parameter), and an object of class "phylo" on console. Optionally, a directory with the orthologous groups used for the alignment (see keepOgs parameter).

Author(s)

Ignacio Ferres

References

Paradis E., Claude J. & Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.

Schliep K.P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics, 27(4) 592-593.

Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30(4):772-780.

Jensen LJ, Julien P, Kuhn M, et al. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Research 36(Database issue):D250-D254.

S. R. Eddy. 2011. Accelerated profile HMM searches. PLoS Comp. Biol. 7:e1002195.


iferres/phylen documentation built on May 24, 2019, 2:04 a.m.