assignGRLexonNames: Assign exon names to GRangesList

assignGRLexonNamesR Documentation

Assign exon names to GRangesList

Description

Assign exon names to GRangesList

Usage

assignGRLexonNames(
  GRL,
  geneSymbolColname = "geneSymbol",
  exonNameColname = paste0(geneSymbolColname, "Exon"),
  suffix = "_exon",
  renameOnes = FALSE,
  filterTwoStrand = TRUE,
  checkDisjoin = c("warn", "none", "stop"),
  assignGRLnames = TRUE,
  verbose = FALSE,
  ...
)

Arguments

GRL

GRangesList input object. Ideally, the input GRanges are disjoint, meaning no two exons overlap, but instead are represented by non-overlapping regions that are sometimes adjacent.

geneSymbolColname

character string indicating with colname of values(GRL) contains the unique gene symbol. If this value does not already exist, it is created and populated using names(GRL). If names(GRL) does not exist, it tries to use values(GRL)[[geneSymbolColname]] then if that does not exist it tries to use values(GRL@unlistData)[[geneSymbolColname]] with the first entry in each GRanges from GRL.

exonNameColname

character string to be used for the resulting exon name. By default "Exon" is appended to the end of the geneSymbolColname.

suffix

character value indicating the suffix to add to the geneSymbolColname to indicate individual exon numbers.

renameOnes

logical indicating whether to name exon sub-sections when the exon is not subdivided. For example, when renameOnes=FALSE, an exon with one section would be named, "exon1", otherwise when renameOnes=TRUE an exon with one section would be named, "exon1a". This distinction can be helpful to recognize exons that contain no subsections.

filterTwoStrand

logical indicating whether to filter out genes occurring on two different strands.

checkDisjoin

character value indicating how to handle non-disjoint input GRL ranges. When checkDisjoin="stop" then any non-disjoint GRanges in an element of GRL will cause the function to fail.

assignGRLnames

logical indicating whether names for the resulting GRangesList should use the exon names.

verbose

logical indicating whether to print verbose output.

...

additional arguments are ignored.

Details

This function takes a GRangesList object with an annotated gene symbol column, and defines exon numbers for each distinct (non-adjacent) range per gene. When multiple ranges overlap, one exon number is applied to them all, and disjoint ranges are denoted using a letter suffix.

For example the exon labels below:

|======|......|======|=======|======|......|======|=======| .exon1.........exon2a.exon2b..exon2c........exon3a..exon3b.

The full name for each feature will become:

  • Gene_exon1

  • Gene_exon2a

  • Gene_exon2b

  • Gene_exon2c

  • Gene_exon3a

  • Gene_exon3b

The reasoning, is to preserve the gene symbol for readability, but to number exons to indicate the numbered contiguous exon, with suffix to indicate the sub-section of each exon.

Value

GRangesList object with additional columns indicating the exon name.

See Also

Other jam GRanges functions: addGRLgaps(), addGRgaps(), annotateGRLfromGRL(), annotateGRfromGR(), closestExonToJunctions(), combineGRcoverage(), exoncov2polygon(), findOverlapsGRL(), flattenExonsBy(), getFirstStrandedFromGRL(), getGRLgaps(), getGRcoverageFromBw(), getGRgaps(), grl2df(), jam_isDisjoint(), make_ref2compressed(), sortGRL(), spliceGR2junctionDF(), stackJunctions()

Other jam RNA-seq functions: closestExonToJunctions(), combineGRcoverage(), defineDetectedTx(), detectedTxInfo(), exoncov2polygon(), flattenExonsBy(), getGRcoverageFromBw(), groups2contrasts(), internal_junc_score(), makeTx2geneFromGtf(), make_ref2compressed(), prepareSashimi(), runDiffSplice(), sortSamples(), spliceGR2junctionDF()

Examples

gene1 <- GenomicRanges::GRanges(seqnames="chr1",
   ranges=IRanges::IRanges(
      start=c(100, 200, 400, 500),
      width=c(100, 50, 50, 150)),
   strand="+",
   geneSymbol="Gene1");
gene2 <- GenomicRanges::GRanges(seqnames="chr1",
   ranges=IRanges::IRanges(
      start=c(600, 900, 1150, 1200),
      width=c(200, 100, 100, 50)),
   strand="-",
   geneSymbol="Gene2");
gene3 <- GenomicRanges::GRanges(seqnames="chr1",
   ranges=IRanges::IRanges(
      start=c(1500),
      width=c(75)),
   strand="+",
   geneSymbol="Gene3");
GRL <- GenomicRanges::GRangesList(list(gene1, gene2, gene3))
assignGRLexonNames(GRL)



jmw86069/splicejam documentation built on Dec. 9, 2022, midnight