Description Usage Arguments Details Value Author(s) Examples
These functions implement various attempts at variant calling.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | callVariantsPaired( data, sampledata, cl = vcConfParams() )
vcConfParams(
minStrandCov = 5,
maxStrandCov = 200,
minStrandAltSupport = 2,
maxStrandAltSupportControl = 0,
minStrandDelSupport = minStrandAltSupport,
maxStrandDelSupportControl = maxStrandAltSupportControl,
minStrandInsSupport = minStrandAltSupport,
maxStrandInsSupportControl = maxStrandAltSupportControl,
minStrandCovControl = 5,
maxStrandCovControl = 200,
bases = 5:8,
returnDataPoints = TRUE,
annotateWithBackground = TRUE,
mergeCalls = TRUE,
mergeAggregator = mean,
pValueAggregator = max
)
|
data |
A |
sampledata |
A |
cl |
A list with parameters used by the variant calling
functions. Such a list can be produced, for instance, by a call to
|
minStrandCov |
Minimum coverage per strand in the case sample. |
maxStrandCov |
Maximum coverage per strand in the case sample. |
minStrandCovControl |
Minimum coverage per strand in the control sample. |
maxStrandCovControl |
Maximum coverage per strand in the control sample. |
minStrandAltSupport |
Minimum support for the alternative allele per strand in the case sample. This should be 1 or higher. |
maxStrandAltSupportControl |
Maximum support for the alternative allele per strand in the control sample. This should usually be 0. |
minStrandDelSupport |
Minimum support for the deletion per strand in the case sample. This should be 1 or higher. |
maxStrandDelSupportControl |
Maximum support for the deletion per strand in the control sample. This should usually be 0. |
minStrandInsSupport |
Minimum support for the insertion per strand in the case sample. This should be 1 or higher. |
maxStrandInsSupportControl |
Maximum support for the insertion per strand in the control sample. This should usually be 0. |
bases |
Indices for subsetting in the bases dimension of the Counts array, 5:8 extracts only those calls made in the middle one of the sequencing cycle bins. |
returnDataPoints |
Boolean flag to specify that a data.frame
with the variant calls should be returned, otherwise only position are returned as a numeric vector.
If |
annotateWithBackground |
Boolean flag to specify that the
background mismatch / deletion frequency estimated from all control
samples in the cohort should be added to the output. A simple binomial
test will be performed as well. Only usefull if |
mergeCalls |
Boolean flag to specify that adjacent calls should be
merged where appropriate (used by |
mergeAggregator |
Aggregator function for merging adjacent calls,
defaults to |
pValueAggregator |
Aggregator function for combining the p-values
of adjacent calls when merging, defaults to |
data
is a list of datasets which has to at least contain the
Counts
and Coverages
for variant calling respectively
Deletions
for deletion calling. This list will usually be
generated by a call to the h5dapply
function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See ?h5dapply
for specifics. In order for callVariantsPaired
to return the correct locations of the variants there must be the h5dapplyInfo
slot present in data
as well. This is itself a list (being automatically added by
h5dapply
and h5readBlock
respectively) and contains the slots Group
(location in the HDF5 file) and Blockstart
, which are used to set the chromosome
and the genomic positions of variants.
vcConfParams
is a helper function that builds a set of variant
calling parameters as a list. This list is provided to the calling
functions e.g. callVariantsPaired
and influences their behavior.
callVariantsPaired
implements a simple pairwise variant
callign approach applying the filters specified in cl
, and
might additionally computes an estimate of the background mismatch
rate (the mean mismatch rate of all samples labeled as 'Control' in
the sampledata
and annotate the calls with p-values for the
binom.test
of the observed mismatch counts and coverage at each
of the samples labeled as 'Case'.
The result is either a list of positions with SNVs / deletions or a
data.frame
containing the calls themselves which might contain
annotations. Adjacent calls might be merged and calls might be
annotated with p-values depending on configuration parameters.
When the configuration parameter returnDataPoints
is FALSE
the functions return the positions of potential variants as a list containing one integer vector of positions for each sample, if no positions were found for a sample the list will contain NULL
instead. In the case of returnDatapoints == TRUE
the functions return either NULL
if no poisitions were found or a data.frame
with the following slots:
Chrom |
The chromosome the potential variant / deletion is on |
Start |
The starting position of the variant / deletion |
End |
The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions) |
Sample |
The |
altAllele |
The alternate allele for SNVs (skipped for deletions, would be |
refAllele |
The reference allele for SNVs (skipped for deletions since the tally file might not contain all the information necessary to extract it) |
caseCountFwd |
Support for the variant in the |
caseCountRev |
Support for the variant in the |
caseCoverageFwd |
Coverage of the variant position in the |
caseCoverageRev |
Coverage of the variant position in the |
controlCountFwd |
Support for the variant in the |
controlCountRev |
Support for the variant in the |
controlCoverageFwd |
Coverage of the variant position in the |
controlCoverageRev |
Coverage of the variant position in the |
If the annotateWithBackground
option is set the following extra columns are returned
backgroundFrequencyFwd |
The averaged frequency of mismatches / deletions at the position of all samples of type |
backgroundFrequencyRev |
The averaged frequency of mismatches / deletions at the position of all samples of type |
pValueFwd |
The |
pValueRev |
The |
The function callDeletionsPaired
merges adjacent single-base deletion calls if the option mergeCalls
is set to TRUE
, in that case the counts and coverages ( e.g. caseCountFwd
) are aggregated using the function supplied in the mergeAggregator
option of the configuration list (defaults to mean
) and the p-values pValueFwd
and pValueFwd
(if annotateWithBackground
is TRUE
), are aggregated using the function supplied in the pValueAggregator
option (defaults to max
).
Paul Pyl
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsPaired,
sampledata = sampleData,
cl = vcConfParams(returnDataPoints=TRUE),
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variant
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.