core_snp_genome: Core SNP Genome

View source: R/core_snp_genome.R

core_snp_genomeR Documentation

Core SNP Genome

Description

This function find the core snp genome. Citing Torsten Seemann : If you call SNPs for multiple isolates from the same reference, you can produce an alignment of "core SNPs" which can be used to build a high-resolution phylogeny (ignoring possible recombination). A "core site" is a genomic position that is present in all the samples. A core site can have the same nucleotide in every sample ("monomorphic") or some samples can be different ("polymorphic" or "variant"). If we ignore the complications of "ins", "del" variant types, and just use variant sites, these are the "core SNP genome".

Usage

core_snp_genome(
  file_list,
  n_cores,
  ref,
  type,
  x,
  min_call_length,
  min_call_qual,
  asm
)

Arguments

file_list

Data frame with the full path to the nucleotide genome files (gene or genomes) or a gff_list object.

n_cores

Number of cores to use.

ref

Reference genome (if missing, one is selected randomly)

type

Just for gff_list objects. You must especified if you want to use whole genome sequences "wgs" or genes "nucl"

x

minimap preset (see details)

min_call_length

min alignment length to call variants and compute coverage (expert parameters)

min_call_qual

min mapping quality (expert parameters)

asm

Minimap2 preseting options (see details)

Details

This function uses minimap2 to align all the genomes to a reference genome. If reference genome is not specified then core_snp_genome() takes one ramdomly. Once we have all the genomes aligned we look for the conserved regions between all the genomes. Then, we call for the variants using the tool provided with minimap2: paftools.

Finally the SNPs al filtered by the common regions to produce the final core SNP genome.

Minimap has some preset setting to map different kind of sequences.

  • map-pb/map-ont: PacBio/Nanopore vs reference mapping

  • ava-pb/ava-ont: PacBio/Nanopore read overlap

  • asm5/asm10/asm20: asm-to-ref mapping, for ~0.1/1/5% sequence divergence

  • splice: long-read spliced alignment

  • sr: genomic short-read mapping

We recommend to use map-bp (maximum number of SNPs ~ less accuracy) or asm5 (lowest SNP max accuracy), asm10 (medium SNP, medium accuracy) or asm20 (high SNP, low accuracy)

The rest of the presets are designed for other purposes

Value

core_snp_genome object

References

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191


irycisBioinfo/PATO documentation built on Oct. 19, 2023, 3:07 p.m.