summarizeVariants | R Documentation |
Variants in a VCF file are overlapped with an annotation region and summarized by sample. Genotype information in the VCF is used to determine which samples express each variant.
## S4 method for signature 'TxDb,VCF,CodingVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'TxDb,VCF,FiveUTRVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'TxDb,VCF,ThreeUTRVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'TxDb,VCF,SpliceSiteVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'TxDb,VCF,IntronVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'TxDb,VCF,PromoterVariants'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'GRangesList,VCF,VariantType'
summarizeVariants(query, subject, mode, ...)
## S4 method for signature 'GRangesList,VCF,function'
summarizeVariants(query, subject, mode, ...)
query |
A TxDb or |
subject |
A VCF object containing the variants. |
mode |
When
|
... |
Additional arguments passed to methods such as
|
summarizeVariants
uses the genotype information in a VCF
file to determine which samples are positive for each variant.
Variants are overlapped with the annotation and the counts
are summarized annotation-by-sample. If the annotation is a
GRangesList
of transcripts, the count matrix will
be transcripts-by-sample. If the GRangesList
is genes,
the count matrix will be gene-by-sample.
Counting with locateVariants() :
Variant counts are always summarized transcript-by-sample.
When query
is a GRangesList
, it must be compatible
with the VariantType
-class given as the mode
argument.
The list below specifies the appropriate GRangesList
for each
mode
.
coding (CDS) by transcript
introns by transcript
five prime UTR by transcript
three prime UTR by transcript
introns by transcript
list of transcripts
When query
is a TxDb
, the appropriate
region-by-transcript GRangesList
listed above is extracted
internally and used as the annotation.
Counting with a user-supplied function :
subject
must be a GRangesList
and mode
must
be the name of a function. The count function must take 'query'
and 'subject' arguments and return a Hits
object. Counts are
summarized by the outer list elements of the GRangesList
.
A RangedSummarizedExperiment
object with count summaries in the
assays
slot. The rowRanges
contains the annotation
used for counting. Information in colData
and metadata
are taken from the VCF file.
Valerie Obenchain
readVcf
,
predictCoding
,
locateVariants
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
## Read variants from VCF.
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
## Rename seqlevels to match TxDb; confirm the match.
seqlevels(vcf) <- paste0("chr", seqlevels(vcf))
intersect(seqlevels(vcf), seqlevels(txdb))
## ----------------------------------------
## Counting with locateVariants()
## ----------------------------------------
## TxDb as the 'query'
coding1 <- summarizeVariants(txdb, vcf, CodingVariants())
colSums(assays(coding1)$counts)
## GRangesList as the 'query'
cdsbytx <- cdsBy(txdb, "tx")
coding2 <- summarizeVariants(cdsbytx, vcf, CodingVariants())
stopifnot(identical(assays(coding1)$counts, assays(coding2)$counts))
## Promoter region variants summarized by transcript
tx <- transcripts(txdb)
txlst <- splitAsList(tx, seq_len(length(tx)))
promoter <- summarizeVariants(txlst, vcf,
PromoterVariants(upstream=100, downstream=10))
colSums(assays(promoter)$counts)
## ----------------------------------------
## Counting with findOverlaps()
## ----------------------------------------
## Summarize all variants by transcript
allvariants <- summarizeVariants(txlst, vcf, findOverlaps)
colSums(assays(allvariants)$counts)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.