Description Usage Arguments Details Value Author(s) Examples
These functions implement various attempts at variant calling.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | callVariantsPaired( data, sampledata, cl = vcConfParams() )
vcConfParams(
minStrandCov = 5,
maxStrandCov = 200,
minStrandAltSupport = 2,
maxStrandAltSupportControl = 0,
minStrandDelSupport = minStrandAltSupport,
maxStrandDelSupportControl = maxStrandAltSupportControl,
minStrandInsSupport = minStrandAltSupport,
maxStrandInsSupportControl = maxStrandAltSupportControl,
minStrandCovControl = 5,
maxStrandCovControl = 200,
bases = 5:8,
returnDataPoints = TRUE,
annotateWithBackground = TRUE,
mergeCalls = TRUE,
mergeAggregator = mean,
pValueAggregator = max
)
|
data |
A |
sampledata |
A |
cl |
A list with parameters used by the variant calling
functions. Such a list can be produced, for instance, by a call to
|
minStrandCov |
Minimum coverage per strand in the case sample. |
maxStrandCov |
Maximum coverage per strand in the case sample. |
minStrandCovControl |
Minimum coverage per strand in the control sample. |
maxStrandCovControl |
Maximum coverage per strand in the control sample. |
minStrandAltSupport |
Minimum support for the alternative allele per strand in the case sample. This should be 1 or higher. |
maxStrandAltSupportControl |
Maximum support for the alternative allele per strand in the control sample. This should usually be 0. |
minStrandDelSupport |
Minimum support for the deletion per strand in the case sample. This should be 1 or higher. |
maxStrandDelSupportControl |
Maximum support for the deletion per strand in the control sample. This should usually be 0. |
minStrandInsSupport |
Minimum support for the insertion per strand in the case sample. This should be 1 or higher. |
maxStrandInsSupportControl |
Maximum support for the insertion per strand in the control sample. This should usually be 0. |
bases |
Indices for subsetting in the bases dimension of the Counts array, 5:8 extracts only those calls made in the middle one of the sequencing cycle bins. |
returnDataPoints |
Boolean flag to specify that a data.frame
with the variant calls should be returned, otherwise only position are returned as a numeric vector.
If |
annotateWithBackground |
Boolean flag to specify that the
background mismatch / deletion frequency estimated from all control
samples in the cohort should be added to the output. A simple binomial
test will be performed as well. Only usefull if |
mergeCalls |
Boolean flag to specify that adjacent calls should be
merged where appropriate (used by |
mergeAggregator |
Aggregator function for merging adjacent calls,
defaults to |
pValueAggregator |
Aggregator function for combining the p-values
of adjacent calls when merging, defaults to |
data is a list of datasets which has to at least contain the
Counts and Coverages for variant calling respectively
Deletions for deletion calling. This list will usually be
generated by a call to the h5dapply function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See ?h5dapply for specifics. In order for callVariantsPaired
to return the correct locations of the variants there must be the h5dapplyInfo
slot present in data as well. This is itself a list (being automatically added by
h5dapply and h5readBlock respectively) and contains the slots Group
(location in the HDF5 file) and Blockstart, which are used to set the chromosome
and the genomic positions of variants.
vcConfParams is a helper function that builds a set of variant
calling parameters as a list. This list is provided to the calling
functions e.g. callVariantsPaired and influences their behavior.
callVariantsPaired implements a simple pairwise variant
callign approach applying the filters specified in cl, and
might additionally computes an estimate of the background mismatch
rate (the mean mismatch rate of all samples labeled as 'Control' in
the sampledata and annotate the calls with p-values for the
binom.test of the observed mismatch counts and coverage at each
of the samples labeled as 'Case'.
The result is either a list of positions with SNVs / deletions or a
data.frame containing the calls themselves which might contain
annotations. Adjacent calls might be merged and calls might be
annotated with p-values depending on configuration parameters.
When the configuration parameter returnDataPoints is FALSE the functions return the positions of potential variants as a list containing one integer vector of positions for each sample, if no positions were found for a sample the list will contain NULL instead. In the case of returnDatapoints == TRUE the functions return either NULL if no poisitions were found or a data.frame with the following slots:
Chrom |
The chromosome the potential variant / deletion is on |
Start |
The starting position of the variant / deletion |
End |
The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions) |
Sample |
The |
altAllele |
The alternate allele for SNVs (skipped for deletions, would be |
refAllele |
The reference allele for SNVs (skipped for deletions since the tally file might not contain all the information necessary to extract it) |
caseCountFwd |
Support for the variant in the |
caseCountRev |
Support for the variant in the |
caseCoverageFwd |
Coverage of the variant position in the |
caseCoverageRev |
Coverage of the variant position in the |
controlCountFwd |
Support for the variant in the |
controlCountRev |
Support for the variant in the |
controlCoverageFwd |
Coverage of the variant position in the |
controlCoverageRev |
Coverage of the variant position in the |
If the annotateWithBackground option is set the following extra columns are returned
backgroundFrequencyFwd |
The averaged frequency of mismatches / deletions at the position of all samples of type |
backgroundFrequencyRev |
The averaged frequency of mismatches / deletions at the position of all samples of type |
pValueFwd |
The |
pValueRev |
The |
The function callDeletionsPaired merges adjacent single-base deletion calls if the option mergeCalls is set to TRUE, in that case the counts and coverages ( e.g. caseCountFwd ) are aggregated using the function supplied in the mergeAggregator option of the configuration list (defaults to mean) and the p-values pValueFwd and pValueFwd (if annotateWithBackground is TRUE), are aggregated using the function supplied in the pValueAggregator option (defaults to max).
Paul Pyl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsPaired,
sampledata = sampleData,
cl = vcConfParams(returnDataPoints=TRUE),
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variant
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.