add_utr: Add untranslated regions (UTRs)

View source: R/add_utr.R

add_utrR Documentation

Add untranslated regions (UTRs)

Description

Given a set of exons (encompassing the CDS and UTRs) and cds regions, add_utr() will calculate and add the corresponding UTR regions as ranges. This can be useful when combined with shorten_gaps() to visualize transcripts with long introns, whilst differentiating UTRs from CDS regions.

Usage

add_utr(exons, cds, group_var = NULL)

Arguments

exons

data.frame() contains exons which can originate from multiple transcripts differentiated by group_var.

cds

data.frame() contains coding sequence ranges for the transcripts in exons.

group_var

character() if input data originates from more than 1 transcript, group_var must specify the column that differentiates transcripts (e.g. "transcript_id").

Details

The definition of the inputted cds regions are expected to range from the beginning of the start codon to the end of the stop codon. Sometimes, for example in the case of Ensembl, reference annotation will omit the stop codons from the CDS definition. In such cases, users should manually ensure that the cds includes both the start and stop codons.

Value

data.frame() contains differentiated CDS and UTR ranges.

Examples


library(magrittr)
library(ggplot2)

# to illustrate the package's functionality
# ggtranscript includes example transcript annotation
pknox1_annotation %>% head()

# extract exons
pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == "exon")
pknox1_exons %>% head()

# extract cds
pknox1_cds <- pknox1_annotation %>% dplyr::filter(type == "CDS")
pknox1_cds %>% head()

# the CDS definition originating from the Ensembl reference annotation
# does not include the stop codon
# we must incorporate the stop codons into the CDS manually
# by adding 3 base pairs to the end of the CDS of each transcript
pknox1_cds_w_stop <- pknox1_cds %>%
    dplyr::group_by(transcript_name) %>%
    dplyr::mutate(
        end = ifelse(end == max(end), end + 3, end)
    ) %>%
    dplyr::ungroup()

# add_utr() adds ranges that represent the UTRs
pknox1_cds_utr <- add_utr(
    pknox1_exons,
    pknox1_cds_w_stop,
    group_var = "transcript_name"
)

pknox1_cds_utr %>% head()

# this can be useful when combined with shorten_gaps()
# to visualize transcripts with long introns whilst differentiating UTRs
pknox1_cds_utr_rescaled <-
    shorten_gaps(
        exons = pknox1_cds_utr,
        introns = to_intron(pknox1_cds_utr, "transcript_name"),
        group_var = "transcript_name"
    )

pknox1_cds_utr_rescaled %>%
    dplyr::filter(type == "CDS") %>%
    ggplot(aes(
        xstart = start,
        xend = end,
        y = transcript_name
    )) +
    geom_range() +
    geom_range(
        data = pknox1_cds_utr_rescaled %>% dplyr::filter(type == "UTR"),
        height = 0.25,
        fill = "white"
    ) +
    geom_intron(
        data = to_intron(
            pknox1_cds_utr_rescaled %>% dplyr::filter(type != "intron"),
            "transcript_name"
        ),
        arrow.min.intron.length = 110
    )

dzhang32/ggtranscript documentation built on Aug. 29, 2024, 2:43 a.m.