Calls sample-specific variants by comparing case and control variants from paired samples, starting from the BAM files or unfiltered tallies. For example, these variants would be considered somatic mutations in a tumor vs. normal comparison.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SampleSpecificVariantFilters(control, control.cov, calling.filters, power = 0.8, p.value = 0.01) ## S4 method for signature 'BamFile,BamFile' callSampleSpecificVariants(case, control, tally.param, calling.filters = VariantCallingFilters(), post.filters = FilterRules(), ...) ## S4 method for signature 'character,character' callSampleSpecificVariants(case, control, ...) ## S4 method for signature 'VRanges,VRanges' callSampleSpecificVariants(case, control, control.cov, ...) ## DEPRECATED ## S4 method for signature 'GenomicRanges,GenomicRanges' callSampleSpecificVariants(case, control, control.cov, calling.filters = VariantCallingFilters(), post.filters = FilterRules(), ...)
The BAM file for the case, or the called variants as output by
The BAM file for the control, or the raw tallies as output by
Parameters controlling the variant tallying step,
as typically constructed by
Filters to use for the initial,
single-sample calling against reference, typically constructed by
Filters that are applied after the initial calling step. These
consider the set of variant calls as a whole and remove those with
suspicious patterns. They are only applied to the
For a BAM file, arguments to pass down to the
The coverage for the control sample.
The power cutoff, beneath which a variant will not be called case-specific, due to lack of power in control.
The binomial p-value cutoff for determining whether the control frequency is sufficiently extreme (low) compared to the case frequency. A p-value below this cutoff means that the variant will be called case-specific.
For each sample, the variants are tallied (when the input is BAM), QA
filtered (case only), called and determined to be sample-specific.
callSampleSpecificVariants function is fairly high-level,
but it still allows the user to override the parameters and filters
for each stage of the process. See
It is safest to pass a BAM file, so that the computations are
consistent for both samples. The
GenomicRanges method is
provided mostly for optimization purposes, since tallying the variants
over the entire genome is time-consuming. For small gene-size regions,
performance should not be a concern.
This is the algorithm that determines whether a variant is specific to the case sample:
Filter out all case calls that were also called in
callSampleSpecificVariants function does
not apply the QA filters when calling variants in
control. This prevents a variant from being called specific to
case merely due to questionable data in the control.
For the remaining case calls, calculate whether there was
sufficient power in control under the likelihood ratio test, for a
variant present at the
p.lower frequency. If that is below
power cutoff, discard it.
For the remaining case calls, test whether the control
frequency is sufficient extreme (low) compared to the case
frequency, under the binomial model. The null hypothesis is that
the frequencies are the same, so if the test p-value is above
p.value, discard the variant. Otherwise, the variant is
VRanges with the case-specific variants (such as
Michael Lawrence, Jeremiah Degenhardt
1 2 3 4 5