core_snp_genome: Core SNP Genome
In irycisBioinfo/PATO: Pangenome Analysis Toolkit

core_snp_genome

R Documentation

Core SNP Genome

Description

This function find the core snp genome. Citing Torsten Seemann : If you call SNPs for multiple isolates from the same reference, you can produce an alignment of "core SNPs" which can be used to build a high-resolution phylogeny (ignoring possible recombination). A "core site" is a genomic position that is present in all the samples. A core site can have the same nucleotide in every sample ("monomorphic") or some samples can be different ("polymorphic" or "variant"). If we ignore the complications of "ins", "del" variant types, and just use variant sites, these are the "core SNP genome".

Usage

core_snp_genome(
  file_list,
  n_cores,
  ref,
  type,
  x,
  min_call_length,
  min_call_qual,
  asm
)

Arguments

`file_list`	Data frame with the full path to the nucleotide genome files (gene or genomes) or a gff_list object.
`n_cores`	Number of cores to use.
`ref`	Reference genome (if missing, one is selected randomly)
`type`	Just for gff_list objects. You must especified if you want to use whole genome sequences "wgs" or genes "nucl"
`x`	minimap preset (see details)
`min_call_length`	min alignment length to call variants and compute coverage (expert parameters)
`min_call_qual`	min mapping quality (expert parameters)
`asm`	Minimap2 preseting options (see details)

Details

This function uses minimap2 to align all the genomes to a reference genome. If reference genome is not specified then core_snp_genome() takes one ramdomly. Once we have all the genomes aligned we look for the conserved regions between all the genomes. Then, we call for the variants using the tool provided with minimap2: paftools.

Finally the SNPs al filtered by the common regions to produce the final core SNP genome.

Minimap has some preset setting to map different kind of sequences.

map-pb/map-ont: PacBio/Nanopore vs reference mapping
ava-pb/ava-ont: PacBio/Nanopore read overlap
asm5/asm10/asm20: asm-to-ref mapping, for ~0.1/1/5% sequence divergence
splice: long-read spliced alignment
sr: genomic short-read mapping

We recommend to use map-bp (maximum number of SNPs ~ less accuracy) or asm5 (lowest SNP max accuracy), asm10 (medium SNP, medium accuracy) or asm20 (high SNP, low accuracy)

The rest of the presets are designed for other purposes