Parse the CSQ column of a VCF object into a GRanges object

Share:

Description

Parse the CSQ column in a VCF object returned from the Ensembl Variant Effect Predictor (VEP).

Usage

1
2
3
4
5
6
## S4 method for signature 'character'
parseCSQToGRanges(x, VCFRowID=character(), 
    ..., info.key = "CSQ")
## S4 method for signature 'VCF'
parseCSQToGRanges(x, VCFRowID=character(), 
    ..., info.key = "CSQ")

Arguments

x

The character name of a vcf file on disk or a VCF object

VCFRowID

A character vector of rownames from the original VCF. When provided, the result includes a metadata column named ‘VCFRowID’ which maps the result back to the row (variant) in the original VCF.

When VCFRowID is not provided no ‘VCFRowID’ column is included.

info.key

The name of the INFO key that VEP writes the consequences to in the output (default is CSQ). This should only be used if something other that CSQ was passed in the –vcf_info_field flag in the output options.

...

Arguments passed to other methods. Currently not used.

Details

When ensemblVEP returns a VCF object, the consequence data are returned unparsed in the 'CSQ' INFO column. parseCSQToGRanges parses these data into a GRanges object that is expanded to match the dimension of the 'CSQ' data. Because each variant can have multiple matches, the ranges in the GRanges are repeated.

If rownames from the original VCF are provided as VCFRowID a metadata column is included in the result that maps back to the row (variant) in the original VCF.

Value

Returns a GRanges object with consequence data as the metadata columns.

Author(s)

Valerie Obenchain

References

Ensembl VEP Home: http://uswest.ensembl.org/info/docs/tools/vep/index.html

See Also

ensemblVEP VEPParam-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  file <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") 
  vep <- ensemblVEP(file, param=VEPParam(dataformat=c(vcf=TRUE)))
 
  ## The returned 'CSQ' data are unparsed.
  info(vep)$CSQ
 
  ## Parse into a GRanges and include the 'VCFRowID' column.
  vcf <- readVcf(file, "hg19")
  csq <- parseCSQToGRanges(vep, VCFRowID=rownames(vcf))
  csq[1:4]