Description Usage Arguments Details Value Author(s) Examples
View source: R/call.variants.single.R
A simple single sample variant calling function (calling SNVs and deletions)
1 | callVariantsSingle( data, sampledata, samples = sampledata$Sample, errorRate = 0.001, minSupport = 2, minAF = 0.05, minStrandSupport = 1, mergeDels = TRUE, aggregator = mean)
|
data |
A |
sampledata |
A |
samples |
The samples on which variants should be called, by default all samples specified in sampledata are used |
errorRate |
The expected error rate of the sequencing technology that was used, for illumina this should be |
minSupport |
minimal support required for a position to be considered variant |
minAF |
minimal allelic frequency for an allele at a position to be cosidered a variant |
minStrandSupport |
minimal per-strand support for a position to be considered variant |
mergeDels |
Boolean flag to specify that adjacent deletion calls should be merged |
aggregator |
Aggregator function for merging statistics of adjacent deletion calls,
defaults to |
data
is a list of datasets which has to at least contain the
Counts
and Coverages
for variant calling respectively
Deletions
for deletion calling (if Deletions
is not present no deletion calls will be made).
This list will usually be generated by a call to the h5dapply
function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See h5dapply
for specifics.
callVariantsSingle
implements a simple single sample variant callign approach for SNVs and deletions (if Deletions
is a dataset present in the data
parameter. The function applies three essential filters to the provided data, requiring:
- minSupport
total support for the variant at the position
- minStrandSupport
support for the variant on each strand
- an allele freqeuncy of at least minAF
(for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)
Calls are annotated with the p-Value of a binom.test
of the present support and coverage given the error rate provided in the errorRate
parameter, no filtering is done on this annotation.
Adjacent deletion calls are merged based in the value of the mergeDels
parameter and their statistics are aggregated with the function supplied in the aggregator
parameter.
This function returns a data.frame
containing annotated calls with the following slots:
Chrom |
The chromosome the potential variant / deletion is on |
Start |
The starting position of the variant / deletion |
End |
The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions) |
Sample |
The sample in which the variant was called |
altAllele |
The alternate allele for SNVs (deletions will have a |
refAllele |
The reference allele for SNVs (deletions will have the deleted sequence here as extracted from the |
SupFwd |
Support for the variant in the sample on the forward strand |
SupRev |
Support for the variant in the sample on the reverse strand |
CovFwd |
Coverage of the variant position in the sample on the forward strand |
CovRev |
Coverage of the variant position in the sample on the reverse strand |
AF_Fwd |
Allele frequency of the variant in the sample on the forward strand |
AF_Rev |
Allele frequency of the variant in the sample on the reverse strand |
Support |
Total Support of the variant - i.e. |
Coverage |
Total Coverage of the variant position - i.e. |
AF |
Total allele frequency of the variant, i.e. |
fBackground |
Background frequency of the variant in all samples but the one the variant is called in |
pErrorFwd |
Probablity of the observed support and coverage given the error rate on the forward strand |
pErrorRev |
Probablity of the observed support and coverage given the error rate on the reverse strand |
pError |
Probablity of the observed support and coverage given the error rate on both strands combined |
pError |
Coverage of the variant position in the |
pStrand |
p-Value of a |
Paul Pyl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsSingle,
sampledata = sampleData,
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variant
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.