knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(superintronic) library(magrittr) library(plyranges)
superintronic centers around exploring coverage over genomic regions via computing simple summary statistics and visualisations. The aim is to provide an extremely modular worklfow via an interface built on top of the plyranges package. This means that you can modify any of the steps provided with the plyranges grammar or just use our defaults.
The basic workflow consists of three steps
We assume you are starting from a GTF/GFF file for your given organism.
The reading of the GFF file can be done externally from this package
via rtracklayer::import()
or plyranges::read_gff()
.
library(superintronic) library(magrittr) library(plyranges) features <- system.file("extdata", "Homo_sapiens.GRCh37.75_subset.gtf", package = "airway") %>% collect_parts() %>% filter(source == "protein_coding", n_olaps == 1) features
The result is a GRanges object with number of rows equal to genes,
and columns containing the intronic and exonic features (as list columns)
and the number of times a gene overlaps anyother feature (as well as
any associated gene information from the gtf file). Any further
transformations can be passed to the plyranges grammar, for example you can
use plyranges::filter()
to select the criteria for genes you're interested in.
If you would prefer not to read in an entire GFF file into memory as a GRanges
object, collect_parts()
accepts any of the following as inputs:
rtracklayer
Here we compute a long form GRanges containing coverage scores in parallel over
a set of BAM files. We have structured it so you specify a data frame,
(like you would get from targets in limma
), that contains a column
identified by source
, that specifies the BAM file names. Other options
include specifying a GRanges that represents the genome build, an optional
target GRanges for restricting coverage (requiring the BAM to be indexed),
an argument for dropping entire regions that have zero coverage, and an
argumet for parallel computations.
# signature (dispatch on spec and source) compute_coverage_long(spec, source, .target = NULL, .genome_info = NULL, .drop_empty = TRUE .parallel = BiocParallel::bpparam() )
This function automatically propagates, the metadata associated
with a design onto the resulting GRanges, however a user may also just
provide a vector of BAM files and then add any relevant design variables later
with join_design()
.
As an example, let's use BAM files from the
airway
data package.
design <- read.csv(system.file("extdata", "sample_table.csv", package = "airway")) %>% dplyr::select(sample_id = X, cell, dex, albut) %>% dplyr::mutate(bam = dir(system.file("extdata", package = "airway"), pattern = "*.bam", full.names = TRUE) ) cvg <- compute_coverage_long(design, source = "bam") cvg
Once the coverage is in long form, we can then merge overlapping genomic features over the experimental design via an intersect overlap join and nesting over the union ranges of an index index (usually gene_id).
cvg_over_features <- join_parts(cvg, features) cvg_over_features
We can then compute a bunch of rangenostics
over a given (key and index) accross a column representing the coverage score.
Includes functions for purrr like mapping for sliding/tiling/stretching over genomic windows (using Bioconductor's Views infrastructure).
rango(cvg_over_features, y = score, wt = width, .funs, ...)
Options for visualising coverage over a given range
cvg_over_features <- join_parts(cvg, features) cvg_over_features %>% filter(gene_id == "ENSG00000116649") %>% view_coverage(score = score, colour = feature_type) cvg_over_features %>% filter(gene_id == "ENSG00000116649") %>% view_coverage(score = score, colour = feature_type, facets = dplyr::vars(dex))
This returns a regular old ggplot object, so segments can be overlaid
via the /
operator:
cov_track <- cvg_over_features %>% filter(gene_id == "ENSG00000116649") %>% view_coverage(score = score, colour = feature_type, facets = dplyr::vars(dex)) annotation <- features %>% filter(gene_id == "ENSG00000116649") %>% unnest_parts() %>% view_segments(colour = feature_type) cov_track / annotation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.