callVariants: Variant calling

Description Usage Arguments Details Value Author(s) Examples

Description

These functions implement various attempts at variant calling.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
callVariantsPaired( data, sampledata, cl = vcConfParams() )

vcConfParams(
  minStrandCov = 5,
  maxStrandCov = 200,
  minStrandAltSupport = 2,
  maxStrandAltSupportControl = 0,
  minStrandDelSupport = minStrandAltSupport,
  maxStrandDelSupportControl = maxStrandAltSupportControl,
  minStrandInsSupport = minStrandAltSupport,
  maxStrandInsSupportControl = maxStrandAltSupportControl,
  minStrandCovControl = 5,
  maxStrandCovControl = 200,
  bases = 5:8,
  returnDataPoints = TRUE,
  annotateWithBackground = TRUE,
  mergeCalls = TRUE,
  mergeAggregator = mean,
  pValueAggregator = max
)

Arguments

data

A list with elements Counts (a 4d integer array of size [1:12, 1:2, 1:k, 1:n]), Coverage (a 3d integer array of size [1:2, 1:k, 1:n]), Deletions (a 3d integer array of size [1:2, 1:k, 1:n]), Reference (a 1d integer vector of size [1:n]) – see Details.

sampledata

A data.frame with k rows (one for each sample) and columns Type, Column and (SampleGroup or Patient). The tally file should contain this information as a group attribute, see getSampleData for an example.

cl

A list with parameters used by the variant calling functions. Such a list can be produced, for instance, by a call to vcConfParams.

minStrandCov

Minimum coverage per strand in the case sample.

maxStrandCov

Maximum coverage per strand in the case sample.

minStrandCovControl

Minimum coverage per strand in the control sample.

maxStrandCovControl

Maximum coverage per strand in the control sample.

minStrandAltSupport

Minimum support for the alternative allele per strand in the case sample. This should be 1 or higher.

maxStrandAltSupportControl

Maximum support for the alternative allele per strand in the control sample. This should usually be 0.

minStrandDelSupport

Minimum support for the deletion per strand in the case sample. This should be 1 or higher.

maxStrandDelSupportControl

Maximum support for the deletion per strand in the control sample. This should usually be 0.

minStrandInsSupport

Minimum support for the insertion per strand in the case sample. This should be 1 or higher.

maxStrandInsSupportControl

Maximum support for the insertion per strand in the control sample. This should usually be 0.

bases

Indices for subsetting in the bases dimension of the Counts array, 5:8 extracts only those calls made in the middle one of the sequencing cycle bins.

returnDataPoints

Boolean flag to specify that a data.frame with the variant calls should be returned, otherwise only position are returned as a numeric vector. If returnDataPoints == FALSE only the variant positions are returned.

annotateWithBackground

Boolean flag to specify that the background mismatch / deletion frequency estimated from all control samples in the cohort should be added to the output. A simple binomial test will be performed as well. Only usefull if returnDataPoints == TRUE

mergeCalls

Boolean flag to specify that adjacent calls should be merged where appropriate (used by callDeletionsPaired). Only usefull applied if returnDataPoints == TRUE

mergeAggregator

Aggregator function for merging adjacent calls, defaults to mean, which means that a deletion larger than 1bp will be annotated with the means of the counts and coverages

pValueAggregator

Aggregator function for combining the p-values of adjacent calls when merging, defaults to max. Is only applied if annotateWithBackground == TRUE

Details

data is a list of datasets which has to at least contain the Counts and Coverages for variant calling respectively Deletions for deletion calling. This list will usually be generated by a call to the h5dapply function in which the tally file, chromosome, datasets and regions within the datasets would be specified. See ?h5dapply for specifics. In order for callVariantsPaired to return the correct locations of the variants there must be the h5dapplyInfo slot present in data as well. This is itself a list (being automatically added by h5dapply and h5readBlock respectively) and contains the slots Group (location in the HDF5 file) and Blockstart, which are used to set the chromosome and the genomic positions of variants.

vcConfParams is a helper function that builds a set of variant calling parameters as a list. This list is provided to the calling functions e.g. callVariantsPaired and influences their behavior.

callVariantsPaired implements a simple pairwise variant callign approach applying the filters specified in cl, and might additionally computes an estimate of the background mismatch rate (the mean mismatch rate of all samples labeled as 'Control' in the sampledata and annotate the calls with p-values for the binom.test of the observed mismatch counts and coverage at each of the samples labeled as 'Case'.

Value

The result is either a list of positions with SNVs / deletions or a data.frame containing the calls themselves which might contain annotations. Adjacent calls might be merged and calls might be annotated with p-values depending on configuration parameters.

When the configuration parameter returnDataPoints is FALSE the functions return the positions of potential variants as a list containing one integer vector of positions for each sample, if no positions were found for a sample the list will contain NULL instead. In the case of returnDatapoints == TRUE the functions return either NULL if no poisitions were found or a data.frame with the following slots:

Chrom

The chromosome the potential variant / deletion is on

Start

The starting position of the variant / deletion

End

The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions)

Sample

The Case sample in which the variant was observed

altAllele

The alternate allele for SNVs (skipped for deletions, would be "-")

refAllele

The reference allele for SNVs (skipped for deletions since the tally file might not contain all the information necessary to extract it)

caseCountFwd

Support for the variant in the Case sample on the forward strand

caseCountRev

Support for the variant in the Case sample on the reverse strand

caseCoverageFwd

Coverage of the variant position in the Case sample on the forward strand

caseCoverageRev

Coverage of the variant position in the Case sample on the reverse strand

controlCountFwd

Support for the variant in the Control sample on the forward strand

controlCountRev

Support for the variant in the Control sample on the reverse strand

controlCoverageFwd

Coverage of the variant position in the Control sample on the forward strand

controlCoverageRev

Coverage of the variant position in the Control sample on the reverse strand

If the annotateWithBackground option is set the following extra columns are returned

backgroundFrequencyFwd

The averaged frequency of mismatches / deletions at the position of all samples of type Control on the forward strand

backgroundFrequencyRev

The averaged frequency of mismatches / deletions at the position of all samples of type Control on the reverse strand

pValueFwd

The p.value of the test binom.test( caseCountFwd, caseCoverageFwd, p = backgroundFrequencyFwd, alternative = "greater")

pValueRev

The p.value of the test binom.test( caseCountRev, caseCoverageRev, p = backgroundFrequencyRev, alternative = "greater")

The function callDeletionsPaired merges adjacent single-base deletion calls if the option mergeCalls is set to TRUE, in that case the counts and coverages ( e.g. caseCountFwd ) are aggregated using the function supplied in the mergeAggregator option of the configuration list (defaults to mean) and the p-values pValueFwd and pValueFwd (if annotateWithBackground is TRUE), are aggregated using the function supplied in the pValueAggregator option (defaults to max).

Author(s)

Paul Pyl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  library(h5vc) # loading library
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  position <- 29979629
  windowsize <- 1000
  vars <- h5dapply( # Calling Variants
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 500,
    FUN = callVariantsPaired,
    sampledata = sampleData,
    cl = vcConfParams(returnDataPoints=TRUE),
    names = c("Coverages", "Counts", "Reference", "Deletions"),
    range = c(position - windowsize, position + windowsize)
  )
  vars <- do.call( rbind, vars ) # merge the results from all blocks by row
  vars # We did find a variant

h5vc documentation built on Nov. 8, 2020, 4:56 p.m.