shorten_gaps: Improve transcript structure visualization by shortening gaps

View source: R/shorten_gaps.R

shorten_gapsR Documentation

Improve transcript structure visualization by shortening gaps

Description

For a given set of exons and introns, shorten_gaps() reduces the width of gaps (regions that do not overlap any exons) to a user-inputted target_gap_width. This can be useful when visualizing transcripts that have long introns, to hone in on the regions of interest (i.e. exons) and better compare between transcript structures.

Usage

shorten_gaps(exons, introns, group_var = NULL, target_gap_width = 100L)

Arguments

exons

data.frame() contains exons which can originate from multiple transcripts differentiated by group_var.

introns

data.frame() the intron co-ordinates corresponding to the input exons. This can be created by applying to_intron() to the exons. If introns originate from multiple transcripts, they must be differentiated using group_var. If a user is not using to_intron(), they must make sure intron start/ends are defined precisely as the adjacent exon boundaries (rather than exon end + 1 and exon start - 1).

group_var

character() if input data originates from more than 1 transcript, group_var must specify the column that differentiates transcripts (e.g. "transcript_id").

target_gap_width

integer() the width in base pairs to shorten the gaps to.

Details

After shorten_gaps() reduces the size of gaps, it will re-scale exons and introns to preserve exon alignment. This process will only reduce the width of input introns, never exons. Importantly, the outputted re-scaled co-ordinates should only be used for visualization as they will not match the original genomic coordinates.

Value

data.frame() contains the re-scaled co-ordinates of introns and exons of each input transcript with shortened gaps.

Examples


library(magrittr)
library(ggplot2)

# to illustrate the package's functionality
# ggtranscript includes example transcript annotation
pknox1_annotation %>% head()

# extract exons
pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == "exon")
pknox1_exons %>% head()

# to_intron() is a helper function included in ggtranscript
# which is useful for converting exon co-ordinates to introns
pknox1_introns <- pknox1_exons %>% to_intron(group_var = "transcript_name")
pknox1_introns %>% head()

# for transcripts with long introns, the exons of interest
# can be difficult to visualize clearly when using the default scale
pknox1_exons %>%
    ggplot(aes(
        xstart = start,
        xend = end,
        y = transcript_name
    )) +
    geom_range() +
    geom_intron(
        data = pknox1_introns,
        arrow.min.intron.length = 3500
    )

# in such cases it can be useful to rescale the exons and introns
# using shorten_gaps() which shortens regions that do not overlap an exon
pknox1_rescaled <-
    shorten_gaps(pknox1_exons, pknox1_introns, group_var = "transcript_name")

pknox1_rescaled %>% head()

# this allows us to visualize differences in exonic structure more clearly
pknox1_rescaled %>%
    dplyr::filter(type == "exon") %>%
    ggplot(aes(
        xstart = start,
        xend = end,
        y = transcript_name
    )) +
    geom_range() +
    geom_intron(
        data = pknox1_rescaled %>% dplyr::filter(type == "intron"),
        arrow.min.intron.length = 300
    )

# shorten_gaps() can be used in combination with to_diff()
# to further highlight differences in exon structure
# here, all other transcripts are compared to the MANE-select transcript
pknox1_rescaled_diffs <- to_diff(
    exons = pknox1_rescaled %>%
        dplyr::filter(type == "exon", transcript_name != "PKNOX1-201"),
    ref_exons = pknox1_rescaled %>%
        dplyr::filter(type == "exon", transcript_name == "PKNOX1-201"),
    group_var = "transcript_name"
)

pknox1_rescaled %>%
    dplyr::filter(type == "exon") %>%
    ggplot(aes(
        xstart = start,
        xend = end,
        y = transcript_name
    )) +
    geom_range() +
    geom_intron(
        data = pknox1_rescaled %>% dplyr::filter(type == "intron"),
        arrow.min.intron.length = 300
    ) +
    geom_range(
        data = pknox1_rescaled_diffs,
        aes(fill = diff_type),
        alpha = 0.2
    )

dzhang32/ggtranscript documentation built on Aug. 29, 2024, 2:43 a.m.