knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", warnings = FALSE, error = FALSE, message = FALSE )
superintronic centers around exploring coverage over genomic regions via computing simple summary statistics and visualisations. The aim is to provide a modular worklfow via an interface built on top of the plyranges package and emphasising simple
Often, as quality control check in sequencing analysis you would like to visualise coverage within each sample over the exon/intron parts of gene of interest. This is simple with superintronic!
As an example, we will use the sample
BAM files from the airway
data package in Bioconductor and our target gene is SRM
The basic workflow consists of three steps
(Note that we are considering a single gene for illustrative purposes and in principle you can look at coverage over any genomic region you're interested in!)
You can start from a GTF/GFF file for your given organism and collect_parts()
to gather up the intronic/exonic regions for each gene. You can then use the
plyranges grammar to select your target gene.
library(superintronic) suppressPackageStartupMessages(library(plyranges)) features <- system.file("extdata", "Homo_sapiens.GRCh37.75_subset.gtf", package = "airway") %>% collect_parts() %>% filter(gene_name == "SRM") features
You can compute a long form GRanges containing coverage scores in parallel over
a set of BAM files. All you need to begin is a character vector containing
the path the BAM files or a data.frame
representing the experimental design
that contains a column of BAM files.
design <- read.csv(system.file("extdata", "sample_table.csv", package = "airway")) %>% dplyr::select(sample_id = X, cell, dex, albut) %>% dplyr::mutate(bam = dir(system.file("extdata", package = "airway"), pattern = "*.bam", full.names = TRUE) ) cvg <- compute_coverage_long(design, source = "bam") cvg
This function automatically propagates, the metadata associated with a design onto the resulting GRanges. You can speed up this computation if your BAM files are indexed by providing a target GRanges so coverage will be restricted to that set of ranges only.
Once the coverage has been computed as a GRanges object,
we can then intersect overlapping gene parts (from collect_parts()
)
cvg_over_features <- cvg %>% select(-bam) %>% join_parts(features) cvg_over_features
Further summaries can then be computed using rangewise diagnostics,
see the rangle()
function for details.
Coverage over a gene or a genomic region can be constructed via
the view_coverage()
function. The coverage score is disjoined
and aggregated over all samples that overlaps the target region:
library(ggplot2) p <- cvg_over_features %>% mutate(strand = feature_strand) %>% view_coverage(score = score, colour = feature_type) + scale_color_brewer(palette = "Dark2") + guides(colour = FALSE) + labs(title = "Coverage over SRM") p
By default, the coverage is aligned from the 5' to the 3' end of the target region, and exon regions are coloured green and intron parts are colour orange (if you have set the strand on the input data, otherwise coverage is oriented 3' to 5').
The function returns a ggplot object, so extra annotations or layers can be added
via the ggplot2
library.
We can also create track plots to include the gene body by adding in segments with
gene_track <- view_segments(unnest_parts(features), colour = feature_type) gene_track
And via the patchwork
, generate tracks:
p / gene_track
Coverage scores can also be facetted if additional experimental design information has been added. For example, we can split the coverage by each sample or treatment group or cell type
p <- cvg_over_features %>% mutate(strand = feature_strand) %>% view_coverage(score = score, colour = feature_type, facets = vars(cell)) + scale_color_brewer(palette = "Dark2") + guides(colour = FALSE) + labs(title = "Coverage over SRM") p / gene_track + patchwork::plot_layout(heights = c(3, 1))
superintronic provides a general API to compute not only coverage, but also aggregate and visualise other measurements of interest over any arbirtrary genomic region. See the vignette for details and the plyranges documentation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.