exonicParts: Extract non-overlapping exonic or intronic parts from a...

View source: R/exonicParts.R

exonicPartsR Documentation

Extract non-overlapping exonic or intronic parts from a TxDb-like object

Description

exonicParts and intronicParts extract the non-overlapping (a.k.a. disjoint) exonic or intronic parts from a TxDb-like object.

Usage

exonicParts(txdb, linked.to.single.gene.only=FALSE)
intronicParts(txdb, linked.to.single.gene.only=FALSE)

## 3 helper functions used internally by exonicParts() and intronicParts():
tidyTranscripts(txdb, drop.geneless=FALSE)
tidyExons(txdb, drop.geneless=FALSE)
tidyIntrons(txdb, drop.geneless=FALSE)

Arguments

txdb

A TxDb object, or any TxDb-like object that supports the transcripts() and exonsBy() extractors (e.g. an EnsDb object).

linked.to.single.gene.only

TRUE or FALSE.

If FALSE (the default), then the disjoint parts are obtained by calling disjoin() on all the exons (or introns) in txdb, including on exons (or introns) not linked to a gene or linked to more than one gene.

If TRUE, then the disjoint parts are obtained in 2 steps:

  1. call disjoin() on the exons (or introns) linked to at least one gene,

  2. then drop the parts linked to more than one gene from the set of exonic (or intronic) parts obtained previously.

drop.geneless

If FALSE (the default), then all the transcripts (or exons, or introns) get extracted from the TxDb object.

If TRUE, then only the transcripts (or exons, or introns) that are linked to a gene get extracted from the TxDb object.

Note that drop.geneless also impacts the order in which the features are returned:

  • Transcripts: If drop.geneless is FALSE then transcripts are returned in the same order as with transcripts, which is expected to be by internal transcript id (tx_id). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id.

  • Exons: If drop.geneless is FALSE then exons are ordered first by internal transcript id (tx_id), then by exon rank (exon_rank). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id, and then by exon rank.

  • Introns: If drop.geneless is FALSE then introns are ordered by internal transcript id (tx_id). Otherwise they are ordered first by gene id (gene_id), then by internal transcript id.

Value

exonicParts returns a disjoint and strictly sorted GRanges object with 1 range per exonic part and with metadata columns tx_id, tx_name, gene_id, exon_id, exon_name, and exon_rank. If linked.to.single.gene.only was set to TRUE, an additional exonic_part metadata column is added that indicates the rank of each exonic part within all the exonic parts linked to the same gene.

intronicParts returns a disjoint and strictly sorted GRanges object with 1 range per intronic part and with metadata columns tx_id, tx_name, and gene_id. If linked.to.single.gene.only was set to TRUE, an additional intronic_part metadata column is added that indicates the rank of each intronic part within all the intronic parts linked to the same gene.

tidyTranscripts returns a GRanges object with 1 range per transcript and with metadata columns tx_id, tx_name, and gene_id.

tidyExons returns a GRanges object with 1 range per exon and with metadata columns tx_id, tx_name, gene_id, exon_id, exon_name, and exon_rank.

tidyIntrons returns a GRanges object with 1 range per intron and with metadata columns tx_id, tx_name, and gene_id.

Author(s)

Hervé Pagès

See Also

  • disjoin in the IRanges package.

  • transcripts, transcriptsBy, and transcriptsByOverlaps, for extracting genomic feature locations from a TxDb-like object.

  • transcriptLengths for extracting the transcript lengths (and other metrics) from a TxDb object.

  • extendExonsIntoIntrons for extending exons into their adjacent introns.

  • extractTranscriptSeqs for extracting transcript (or CDS) sequences from chromosome sequences.

  • coverageByTranscript for computing coverage by transcript (or CDS) of a set of ranges.

  • The TxDb class.

Examples

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

## ---------------------------------------------------------------------
## exonicParts()
## ---------------------------------------------------------------------

exonic_parts1 <- exonicParts(txdb)
exonic_parts1

## Mapping from exonic parts to genes is many-to-many:
gene_id1 <- mcols(exonic_parts1)$gene_id
gene_id1  # CharacterList object
table(lengths(gene_id1))
## The number of known genes a Human exonic part can be linked to
## varies from 0 to 22!

exonic_parts2 <- exonicParts(txdb, linked.to.single.gene.only=TRUE)
exonic_parts2

## Mapping from exonic parts to genes now is many-to-one:
gene_id2 <- mcols(exonic_parts2)$gene_id
gene_id2[1:20]  # character vector

## Select exonic parts for a given gene:
exonic_parts2[gene_id2 %in% "643837"]

## Sanity checks:
stopifnot(isDisjoint(exonic_parts1), isStrictlySorted(exonic_parts1))
stopifnot(isDisjoint(exonic_parts2), isStrictlySorted(exonic_parts2))
stopifnot(all(exonic_parts2 %within% reduce(exonic_parts1)))
stopifnot(identical(
    lengths(gene_id1) == 1L,
    exonic_parts1 %within% exonic_parts2
))

## ---------------------------------------------------------------------
## intronicParts()
## ---------------------------------------------------------------------

intronic_parts1 <- intronicParts(txdb)
intronic_parts1

## Mapping from intronic parts to genes is many-to-many:
mcols(intronic_parts1)$gene_id
table(lengths(mcols(intronic_parts1)$gene_id))
## A Human intronic part can be linked to 0 to 22 known genes!

intronic_parts2 <- intronicParts(txdb, linked.to.single.gene.only=TRUE)
intronic_parts2

## Mapping from intronic parts to genes now is many-to-one:
class(mcols(intronic_parts2)$gene_id)  # character vector

## Sanity checks:
stopifnot(isDisjoint(intronic_parts1), isStrictlySorted(intronic_parts1))
stopifnot(isDisjoint(intronic_parts2), isStrictlySorted(intronic_parts2))
stopifnot(all(intronic_parts2 %within% reduce(intronic_parts1)))
stopifnot(identical(
    lengths(mcols(intronic_parts1)$gene_id) == 1L,
    intronic_parts1 %within% intronic_parts2
))

## ---------------------------------------------------------------------
## Helper functions
## ---------------------------------------------------------------------

tidyTranscripts(txdb)                      # Ordered by 'tx_id'.
tidyTranscripts(txdb, drop.geneless=TRUE)  # Ordered first by 'gene_id',
                                           # then by 'tx_id'.

tidyExons(txdb)                            # Ordered first by 'tx_id',
                                           # then by 'exon_rank'.
tidyExons(txdb, drop.geneless=TRUE)        # Ordered first by 'gene_id',
                                           # then by 'tx_id',
                                           # then by 'exon_rank'.

tidyIntrons(txdb)                          # Ordered by 'tx_id'.
tidyIntrons(txdb, drop.geneless=TRUE)      # Ordered first by 'gene_id',
                                           # then by 'tx_id'.

Bioconductor/GenomicFeatures documentation built on Nov. 7, 2024, 4:25 a.m.