exactSNP: Accurately and Efficiently call SNPs

Description Usage Arguments Details Value Author(s)

View source: R/exactSNP.R

Description

Measure background noises and perform Fisher's Exact tests to detect SNPs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
exactSNP(

    # basic input/output options
    readFile,
    isBAM = FALSE,
    refGenomeFile,
    SNPAnnotationFile = NULL,
    outputFile = paste0(readFile, ".exactSNP.VCF"),

    # fine tuning parameters 
    qvalueCutoff = 12,
    minAllelicFraction = 0,
    minAllelicBases = 1,
    minReads = 1,
    maxReads = 1000000,
    minBaseQuality = 13,
    nTrimmedBases = 3,
    nthreads = 1)

Arguments

readFile

a character string giving the name of a file including read mapping results. This function takes as input a SAM file by default. If a BAM file is provided, the isBAM argument should be set to TRUE.

isBAM

logical indicating if the file provided via readFile is a BAM file. FALSE by default.

refGenomeFile

a character string giving the name of a file that includes reference sequences (FASTA format).

SNPAnnotationFile

a character string giving the name of a VCF-format file that includes annotated SNPs (the file can be uncompressed or gzip compressed). Such annotation can be downloaded from public databases such as dbSNP. Incorporating known SNPs into SNP calling has been found to be helpful. However note that the annotated SNPs may or may not be called for the sample being analyzed.

outputFile

a character string giving the name of the output file to be generated by this function. The output file includes all the reported SNPs (in VCF format). It includes discovered indels as well.

qvalueCutoff

a numeric value giving the q-value cutoff for SNP calling at sequencing depth of 50X. 12 by default. The q-value is calcuated as -log10(p), where p is the p-value yielded from the Fisher's Exact test. Note that this function automatically adjusts the q-value cutoff for each chromosomal location according to its sequencing depth, based on this cutoff.

minAllelicFraction

a numeric value giving the minimum fraction of allelic bases out of all read bases included at a chromosomal location required for SNP calling. Its value must be within 0 and 1. 0 by default.

minAllelicBases

an integer giving the minimum number of allelic (mis-matched) bases a SNP must have at a chromosomal location. 1 by default.

minReads

an integer giving the minimum number of mapped reads a SNP-containing location must have (ie. the minimum coverage). 1 by default.

maxReads

an integer specifying the maximum depth a SNP-containing location is allowed to have. 1000000 by default. Any location having number of mapped reads higher than this threshold will not be considered for SNP calling. This option is useful for removing PCR artefacts.

minBaseQuality

a numeric value giving the minimum base quality score (Phred score) read bases should satisfy before being used for SNP calling. 13 by default(corresponding to base calling p value of 0.05). Read bases with quality scores less than 13 will be excluded from analysis.

nTrimmedBases

a numeric value giving the number of bases trimmed off from each end of the read. 3 by default.

nthreads

a numeric value giving the number of threads/CPUs used. 1 by default.

Details

This function takes as input a SAM/BAM format file, measures local background noise for each chromosomal location and then performs Fisher's exact tests to find statistically significant SNPs .

This function implements a novel algorithm for discovering SNPs. This algorithm is comparable with or better than existing SNP callers, but it is fast more efficient. It can be used to call SNPs for individual samples (ie. no control samples are required). Detail of the algorithm is described in a manuscript which is currently under preparation.

Value

No value is produced but but a VCF format file is written to the current working directory. This file contains detailed information for discovered SNPs including chromosomal locations, reference bases, alternative bases, read coverages, allele frequencies and p values.

Author(s)

Yang Liao and Wei Shi


Rsubread documentation built on March 17, 2021, 6:01 p.m.