callSampleSpecificVariants: Call Sample-Specific Variants
In VariantTools: Tools for Exploratory Analysis of Variant Calls

Description Usage Arguments Details Value Author(s) Examples

Calls sample-specific variants by comparing case and control variants from paired samples, starting from the BAM files or unfiltered tallies. For example, these variants would be considered somatic mutations in a tumor vs. normal comparison.

SampleSpecificVariantFilters(control, control.cov, calling.filters,
                             power = 0.8, p.value = 0.01)
## S4 method for signature 'BamFile,BamFile'
callSampleSpecificVariants(case, control,
  tally.param, calling.filters = VariantCallingFilters(), post.filters =
  FilterRules(), ...)
## S4 method for signature 'character,character'
callSampleSpecificVariants(case, control, ...)
## S4 method for signature 'VRanges,VRanges'
callSampleSpecificVariants(case,
  control, control.cov, ...)
## DEPRECATED
## S4 method for signature 'GenomicRanges,GenomicRanges'
callSampleSpecificVariants(case,
  control, control.cov,
  calling.filters = VariantCallingFilters(), post.filters =
  FilterRules(), ...)

`case`	The BAM file for the case, or the called variants as output by `callVariants`.
`control`	The BAM file for the control, or the raw tallies as output by `tallyVariants`.
`tally.param`	Parameters controlling the variant tallying step, as typically constructed by `TallyVariantsParam`.
`calling.filters`	Filters to use for the initial, single-sample calling against reference, typically constructed by `VariantCallingFilters`.
`post.filters`	Filters that are applied after the initial calling step. These consider the set of variant calls as a whole and remove those with suspicious patterns. They are only applied to the `case` sample; only QA filters are applied to `control`.
`...`	For a BAM file, arguments to pass down to the `GenomicRanges` method. For the `GenomicRanges` method, arguments to pass down to `SampleSpecificVariantFilters`, except for `control.cov`, `control.called`, `control.raw` and `lr.filter`.
`control.cov`	The coverage for the control sample.
`power`	The power cutoff, beneath which a variant will not be called case-specific, due to lack of power in control.
`p.value`	The binomial p-value cutoff for determining whether the control frequency is sufficiently extreme (low) compared to the case frequency. A p-value below this cutoff means that the variant will be called case-specific.

For each sample, the variants are tallied (when the input is BAM), QA filtered (case only), called and determined to be sample-specific. The callSampleSpecificVariants function is fairly high-level, but it still allows the user to override the parameters and filters for each stage of the process. See TallyVariantsParam, VariantQAFilters, VariantCallingFilters and SampleSpecificVariantFilters.

It is safest to pass a BAM file, so that the computations are consistent for both samples. The GenomicRanges method is provided mostly for optimization purposes, since tallying the variants over the entire genome is time-consuming. For small gene-size regions, performance should not be a concern.

This is the algorithm that determines whether a variant is specific to the case sample:

Filter out all case calls that were also called in control. The callSampleSpecificVariants function does not apply the QA filters when calling variants in control. This prevents a variant from being called specific to case merely due to questionable data in the control.
For the remaining case calls, calculate whether there was sufficient power in control under the likelihood ratio test, for a variant present at the p.lower frequency. If that is below the power cutoff, discard it.
For the remaining case calls, test whether the control frequency is sufficient extreme (low) compared to the case frequency, under the binomial model. The null hypothesis is that the frequencies are the same, so if the test p-value is above p.value, discard the variant. Otherwise, the variant is called case-specific.

A VRanges with the case-specific variants (such as somatic mutations).

Michael Lawrence, Jeremiah Degenhardt

bams <- LungCancerLines::LungCancerBamFiles()
if (requireNamespace("gmapR", quietly=TRUE)) {
    tally.param <- TallyVariantsParam(gmapR::TP53Genome(), 
                                      high_base_quality = 23L,
                                      which = gmapR::TP53Which())
    callSampleSpecificVariants(bams$H1993, bams$H2073, tally.param)
} else {
    data(vignette)
    calling.filters <- VariantCallingFilters(read.count = 3L)
    called.variants <- callVariants(tallies_H1993, calling.filters)
    callSampleSpecificVariants(called.variants, tallies_H2073,
                               coverage_H2073)
}