Find transcriptional strand of base substitutions in vcf

Share:

Description

For the positions that are within gene bodies it is determined whether the "C" or "T" base is on the same strand as the gene definition. (Since by convention we regard base substitutions as C>X or T>X.)

Usage

1
strand_from_vcf(vcf, genes)

Arguments

vcf

GRanges containing the VCF object

genes

GRanges with gene bodies definitions including strand information

Details

Base substitions on the same strand as the gene definitions are considered untranscribed, and on the opposite strand of gene bodies as transcribed, since the gene definitions report the coding or sense strand, which is untranscribed.

No strand information "-" is returned for base substitutions outside gene bodies, or base substitutions that overlap with more than one gene body.

Value

Character vector with transcriptional strand information with length of vcf: "-" for positions outside gene bodies, "U" for untranscribed/sense/coding strand, "T" for transcribed/anti-sense/non-coding strand.

See Also

read_vcfs_as_granges,

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## For this example we need our variants from the VCF samples, and
## a known genes dataset.  See the 'read_vcfs_as_granges()' example
## for how to load the VCF samples.
vcfs <- readRDS(system.file("states/read_vcfs_as_granges_output.rds",
                package="MutationalPatterns"))

# Exclude mitochondrial and allosomal chromosomes.
autosomal = extractSeqlevelsByGroup(species="Homo_sapiens",
                                    style="UCSC",
                                    group="auto")

vcfs = lapply(vcfs, function(x) keepSeqlevels(x, autosomal))

## You can obtain the known genes from the UCSC hg19 dataset using
## Bioconductor:
# source("https://bioconductor.org/biocLite.R")
# biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
# library("TxDb.Hsapiens.UCSC.hg19.knownGene")

## For this example, we preloaded the data for you:
genes_hg19 <- readRDS(system.file("states/genes_hg19.rds",
                        package="MutationalPatterns"))

strand_from_vcf(vcfs[[1]], genes_hg19)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.