options(width=1000) knitr::opts_chunk$set( collapse = TRUE, comment = "#>")
This vignette outlines a workflow of detecting retrotransposed transcripts (RTs)
from Variant Call Format (VCF) using the svaRetro
package.
The svaRetro
package can be installed from Bioconductor as follows:
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("svaRetro")
This package uses a breakend-centric event notation adopted from the
StructuralVariantAnnotation
package. In short, breakends are stored in a
GRanges object with strand used to indicate breakpoint orientation, where
breakpoints are represented using a partner
field containing the name of the
breakend at the other side of the breakend. This notation was chosen as it
simplifies the annotations of RTs which are detected at breakend-level.
VCF data is parsed into a VCF
object using the readVCF
function from the
Bioconductor package VariantAnnotation
. Simple filters could be applied to a
VCF
object to remove unwanted calls. The VCF
object is then converted to a
GRanges
object with breakend-centric notations using
StructuralVariantAnnotation
. More information about VCF
objects and
breakend-centric GRanges object can be found by
consulting the vignettes in the corresponding packages with
browseVignettes("VariantAnnotation")
and
browseVignettes("StructuralVariantAnnotation")
.
library(StructuralVariantAnnotation) library(VariantAnnotation) library(svaRetro) RT_vcf <- readVcf(system.file("extdata", "diploidSV.vcf", package = "svaRetro"))
RT_gr <- StructuralVariantAnnotation::breakpointRanges(RT_vcf, nominalPosition=TRUE) head(RT_gr)
Note that StructuralVariantAnnotation
requires the GRanges
object to be
composed entirely of valid breakpoints. Please consult the vignette of the
StructuralVariantAnnotation
package for ensuring breakpoint consistency.
The package provides rtDetect
to identify RTs using the provided SV calls.
This is achieved by detecting intronic deletions, which are breakpoints at
exon-intron (and intron-exon) boundaries of a transcript. Fusions consisting of
an exon boundary and a second genomic location are reported as potential
insertion sites. Due to the complexity of RT events, insertion sites can be
discovered on both left and right sides, only one side, or none at all.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(dplyr) hg19.genes <- TxDb.Hsapiens.UCSC.hg19.knownGene RT <- rtDetect(RT_gr, hg19.genes, maxgap=10, minscore=0.8)
The output is a list of GRanges
object consisting of two sets of GRanges
calls, insSite
and junctions
, containing candidate insertion sites and
exon-exon junctions respectively. Candidate insertion sites are annotated by
the source transcripts and whether exon-exon junctions are detected for the
source transcripts. RT junction breakends are annotated by the UCSC exon IDs,
corresponding transcripts, and NCBI gene symbols.
RT$SKA3
One way of visualising RT is by circos plots. Here we use the package
circlize
to demonstrate
the visualisation of insertion site and exon-exon junctions.
To generate a simple circos plot of RT event with SKA3 transcript:
library(circlize) rt_chr_prefix <- c(RT$SKA3$junctions, RT$SKA3$insSite) seqlevelsStyle(rt_chr_prefix) <- "UCSC" pairs <- breakpointgr2pairs(rt_chr_prefix) pairs
To see supporting breakpoints clearly, we generate the circos plot according to the loci of event.
circos.initializeWithIdeogram( data.frame(V1=c("chr13", "chr11"), V2=c(21720000,108585000), V3=c(21755000,108586000), V4=c("q12.11","q24.3"), V5=c("gneg","gpos50"))) circos.genomicLink(as.data.frame(S4Vectors::first(pairs)), as.data.frame(S4Vectors::second(pairs))) circos.clear()
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.