core_genome: Core-Genome Alignment

View source: R/core_genome.R

core_genomeR Documentation

Core-Genome Alignment

Description

Find and creates a core-genome alignment. Unlike core_plots() this function find the hard core-genome (genes presence in 100% of genomes and without repetitions (i.e. without paralougs)). The function takes a mmseqs() output, so the definition of the orthologous genes of the core-genome (similarity, coverage and/or e-value)depends on the mmseqs() parameters.

Usage

core_genome(data, type, n_cores, method = "fast")

Arguments

data

An mmseqs object

type

Type of sequence 'nucl' or 'prot'

n_cores

Number of computer core to use

methos

fast (based on blast) or accurate (based on mafft)

Details

The function can performs a pseudo-msa per each ortholog using the function result2msa of mmseqs2.This approach is much faster than classical MSA (clutal, mafft or muscle) but is less accurate. Taking into account that most of the phylogenetic inference software only takes variant columns with no insertions or deletion, there are not to many difference in the final phylogenetic trees.

However, core_genome also implements an accurate method that use mafft to build a MSA of each gene cluster.

core_genome() can build a core-genome alignment of thousands of genomes in minutes.

Value

A core_genome object (a data.frame with two columns: fasta header and sequence)


irycisBioinfo/PATO documentation built on Oct. 19, 2023, 3:07 p.m.