Variant Concordance

Description

Functions for calculating concordance between variant sets and deciding whether two samples have identical genomes.

Usage

1
2
3
calculateVariantConcordance(gr1, gr2, which = NULL)
calculateConcordanceMatrix(variantFiles, ...)
callVariantConcordance(concordanceMatrix, threshold)

Arguments

gr1, gr2

The two tally GRanges to compare

which

A GRanges of positions to which the comparison is limited.

variantFiles

Character vector of paths to files representing tally GRanges. Currently supports serialized (rda) and VCF files. If the file extension is not “vcf”, we assume rda. Will be improved in the future.

concordanceMatrix

A matrix of concordance fractions between sample pairs, as returend by calculateConcordanceMatrix.

threshold

The concordance fraction above which edges are generated between samples when forming the graph.

...

Arguments to pass to the loading function, e.g., readVcf.

Details

The calculateVariantConcordance calculates the fraction of concordant variants between two samples. Concordance is defined as having the same position and alt allele.

The calculateConcordanceMatrix function generates a numeric matrix with the concordance for each pair of samples. It accepts paths to serialized objects so that all variant calls are not loaded in memory at once. This probably should support VCF files, eventually.

The callVariantConcordance function generates a concordant/non-concordant/undecidable status for each sample (that are assumed to originate from the same individual), given the output of calculateConcordanceMatrix. The status is decided as follows. A graph is formed from the concordance matrix using threshold to generate the edges. If there are multiple cliques in the graph that each have more than one sample, every sample is declared undecidable. Otherwise, the samples in the clique with more than one sample, if any, are marked as concordant, and the others (in singleton cliques) are marked as discordant.

Value

Fraction of concordant variants for calculateVariantConcordance, a numeric matrix of concordances for calculateConcordanceMatrix, or a character vector of status codes, named by sample, for callVariantConcordance.

Author(s)

Cory Barr (code), Michael Lawrence (inferred documentation)