pipe.VariantCalls | R Documentation |
Pipeline step to detect and quantify SNPs from aligmment data. Or the lower level function. Calls the SAMTOOLS utility to extract MPILEUP details from BAM files and passes that to the BCFTOOLS variant caller.
pipe.VariantCalls(sampleIDset, annotationFile = "Annotation.txt", optionsFile = "Options.txt",
speciesID = getCurrentSpecies(), results.path = NULL, seqIDset = NULL,
start = NULL, stop = NULL, prob.variant = 0.5,
snpCallMode = c("consensus", "all", "multiallelic"), min.depth = 1,
mpileupArgs = "", vcfArgs = "", comboSamplesName = "Combined", verbose = TRUE)
BAM.variantCalls(files, seqID, fastaFile, start = NULL, stop = NULL, prob.variant = 0.5,
min.depth = 1, max.depth = 10000, min.gap.fraction = 0.25,
mpileupArgs = "", vcfArgs = "", ploidy = 1, geneMap = getCurrentGeneMap(),
snpCallMode = c("consensus", "all", "multiallelic"), verbose = TRUE)
sampleIDset |
Vector of SampleIDs to call SNPs for. Note that the underlying BCFTOOLS variant calling methods are quite different when given a single sample versus multiple BAM files at one time. Most consistent and reliable results are had when the function is called on a single sample at a time, and then merging all SNP calls after the fact. |
files |
Character vector of full pathname BAM files. |
annotationFile |
File of sample annotation details, which specifies all needed
sample-specific information about the samples under study.
See |
optionsFile |
File of processing options, which specifies all processing
parameters that are not sample specific. See |
speciesID |
The SpeciesID of the target species to call SNPs for. By default, use the current species. |
results.path |
The top level folder path for writing result files to. By default, read from the Options file entry 'results.path'. |
seqIDset |
Optional character vector of SeqIDs. Default is to call SNPs for all chromosome, in parallel if possible. |
seqID |
Character string of a single SeqID, that must exist as a named contig in the FASTA file. |
fastaFile |
Character string of the full pathname to one genomic FASTA file. |
start |
|
stop |
Optional numeric limits for the chromosomal region to be inspected. |
prob.variant |
Numeric probability for deciding if a potential SNP site should be returned as real. Passed down as the BCFTOOLS CALL "-p" option. |
snpCallMode |
Controls the behavior of the BCFTOOLS CALL command. As SAMTOOLS evolves their SNP calling algorithms,
we need to maintain some flexibility. In practice, the SNP calling algorithms do a terrible job on
haploid highly variant genomes like plasmodia, so we tend to use the most generic straightforward
algorithm. The |
min.depth |
|
max.depth |
|
max.gap.fraction |
Numeric arguments passed down as the SAMTOOLS MPILEUP "-m" and "-d" and "-F" options, respectively. |
mpileupArgs |
Other optional arguments passed down to SAMTOOLS MPILEUP. |
vcfArgs |
Other optional arguments passed down to BCFTOOLS CALL. |
ploidy |
Designates the organism being SNP called as being either haploid (1) or diploid (2). |
comboSamplesName |
Only used when calling multiple samples at one time. Used as folder and file name prefix. |
As a general rule, clinical samples that are often mixed infections cause the SNP calling tools to perform very poorly. To counter that trend, we often use very lax permissive setting at this step and have the SNP caller return as many potential SNP sites as possible, and then use more rigorous post-SNP-calling analysis to whittle that down to true SNPs.
This functionality can be called either by the high level pipe, which writes a folder of results files, or as a low level wrapper that operates directly on BAM and FASTA files directly, which returns a single data frame.
for pipe.VariantCalls
, a subfolder of files is written under the VariantCalls
subfolder:
VCF.txt |
A file of potential SNP variant allele sites for each chromosome, containing all the columns of details generated by BCFTOOLS CALL. These include all the cryptic scoring and quality metrics and the comma separated list of alternate alleles. |
Summary.VCF.txt |
One final file of SNP sites, after merging all chromosomes and cleaning up much of the BCFTOOLS details. Includes a column "ALT_AA" that tries to suggest if the SNP changes the amino acid sequence of the protein. |
For BAM.variantCalls
, a data frame, as in the chromosomal .VCF.txt
files above.
It is imperative that the genome FASTA file specified in the genomicFastaFile
field of the
option table exactly match the genome used to construct the Bowtie2 index that was used in the
genomic alignment pipeline step. There is no easy way to verify that on the fly during SNP calling.
Bob Morrison
pipe.VariantSummary
for joining all chromosome SNP results into a single file for
one sample, with optional cleaning/filtering.
pipe.VariantComparison
for finding SNPs that are diffentially detected between groups.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.