snpSummary | R Documentation |
Counts and distribution statistics for SNPs in a VCF object
## S4 method for signature 'CollapsedVCF'
snpSummary(x, ...)
x |
A CollapsedVCF object. |
... |
Additional arguments to methods. |
Genotype counts, allele counts and Hardy Weinberg equilibrium (HWE) statistics are calculated for single nucleotide variants in a CollapsedVCF object. HWE has been established as a useful quality filter on genotype data. This equilibrium should be attained in a single generation of random mating. Departures from HWE are indicated by small p values and are almost invariably indicative of a problem with genotype calls.
The following caveats apply:
No distinction is made between phased and unphased genotypes.
Only diploid calls are included.
Only ‘valid’ SNPs are included. A ‘valid’ SNP is defined as having a reference allele of length 1 and a single alternate allele of length 1.
Variants that do not meet these criteria are set to NA.
The object returned is a data.frame
with seven columns.
Counts for genotype 00 (homozygous reference).
Counts for genotype 01 or 10 (heterozygous).
Counts for genotype 11 (homozygous alternate).
Frequency of the reference allele.
Frequency of the alternate allele.
Z-score for departure from a null hypothesis of Hardy Weinberg equilibrium.
p-value for departure from a null hypothesis of Hardy Weinberg equilibrium.
Chris Wallace <cew54@cam.ac.uk>
genotypeToSnpMatrix, probabilityToSnpMatrix
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
## The return value is a data.frame with genotype counts
## and allele frequencies.
df <- snpSummary(vcf)
df
## Compare to ranges in the VCF object:
rowRanges(vcf)
## No statistics were computed for the variants in rows 3, 4
## and 5. They were omitted because row 3 has two alternate
## alleles, row 4 has none and row 5 is not a SNP.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.