core_genome: Core-Genome Alignment
In irycisBioinfo/PATO: Pangenome Analysis Toolkit

core_genome

R Documentation

Core-Genome Alignment

Description

Find and creates a core-genome alignment. Unlike core_plots() this function find the hard core-genome (genes presence in 100% of genomes and without repetitions (i.e. without paralougs)). The function takes a mmseqs() output, so the definition of the orthologous genes of the core-genome (similarity, coverage and/or e-value)depends on the mmseqs() parameters.

Usage

core_genome(data, type, n_cores, method = "fast")

Arguments

`data`	An mmseqs object
`type`	Type of sequence 'nucl' or 'prot'
`n_cores`	Number of computer core to use
`methos`	fast (based on blast) or accurate (based on mafft)

Details

The function can performs a pseudo-msa per each ortholog using the function result2msa of mmseqs2.This approach is much faster than classical MSA (clutal, mafft or muscle) but is less accurate. Taking into account that most of the phylogenetic inference software only takes variant columns with no insertions or deletion, there are not to many difference in the final phylogenetic trees.

However, core_genome also implements an accurate method that use mafft to build a MSA of each gene cluster.

core_genome() can build a core-genome alignment of thousands of genomes in minutes.