Find transcriptional strand of base substitutions in vcf
For the positions that are within gene bodies it is determined whether the "C" or "T" base is on the same strand as the gene definition. (Since by convention we regard base substitutions as C>X or T>X.)
GRanges containing the VCF object
GRanges with gene bodies definitions including strand information
Base substitions on the same strand as the gene definitions are considered untranscribed, and on the opposite strand of gene bodies as transcribed, since the gene definitions report the coding or sense strand, which is untranscribed.
No strand information "-" is returned for base substitutions outside gene bodies, or base substitutions that overlap with more than one gene body.
Character vector with transcriptional strand information with length of vcf: "-" for positions outside gene bodies, "U" for untranscribed/sense/coding strand, "T" for transcribed/anti-sense/non-coding strand.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## For this example we need our variants from the VCF samples, and ## a known genes dataset. See the 'read_vcfs_as_granges()' example ## for how to load the VCF samples. vcfs <- readRDS(system.file("states/read_vcfs_as_granges_output.rds", package="MutationalPatterns")) # Exclude mitochondrial and allosomal chromosomes. autosomal = extractSeqlevelsByGroup(species="Homo_sapiens", style="UCSC", group="auto") vcfs = lapply(vcfs, function(x) keepSeqlevels(x, autosomal)) ## You can obtain the known genes from the UCSC hg19 dataset using ## Bioconductor: # source("https://bioconductor.org/biocLite.R") # biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") # library("TxDb.Hsapiens.UCSC.hg19.knownGene") ## For this example, we preloaded the data for you: genes_hg19 <- readRDS(system.file("states/genes_hg19.rds", package="MutationalPatterns")) strand_from_vcf(vcfs[], genes_hg19)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.