annotateGRfromGR: Annotate GRanges using another GRanges object

annotateGRfromGRR Documentation

Annotate GRanges using another GRanges object

Description

Annotate GRanges using another GRanges object

Usage

annotateGRfromGR(
  GR1,
  GR2,
  grOL = NULL,
  numAsStrings = FALSE,
  stringShrinkFunc = function(...) {
     jamba::cPasteUnique(..., doSort = TRUE)
 },
  numShrinkFunc = sum,
  addStringCols = NULL,
  type = c("any", "start", "end", "within", "equal"),
  ignore.strand = FALSE,
  select = "all",
  sep = ",",
  verbose = FALSE,
  DEBUG = FALSE,
  ...
)

Arguments

GR1

GRanges object, the reference object to which annotations are added.

GR2

GRanges object used for annotations

grOL

overlaps object, optionally used when the GenomicRanges::findOverlaps() function has already been run, mostly intended to save re-processing the overlaps for large objects.

numAsStrings

logical indicating whether numerical values should be treated as strings for the purpose of added annotations. When TRUE, numerical values will be converted to character strings and retained as if they were labels. This argument treats all numeric columns as string, to treat only a subset use the addStringCols argument.

stringShrinkFunc

function used to shrink string annotations by overlapping feature. This function should take a list as input, and sep as an argument indicating the delimiter if applicable, and return a character vector of the same length as the list. By default, jamba::cPasteUnique() is used. Control over the sort and uniqueness should be applied with this function. Note: stringShrinkFunc can be a single function, or can be a named list of function calls, whose names match the colnames of values. If a named list is provided, the first entry is used for any any names in values which are not in names(stringShrinkFunc).

numShrinkFunc

function used to shrink numerical values by overlapping feature. For numeric values, the data.table package is used to apply the function, which enables custom functions not otherwise available via the GenomicRanges methods. Therefore, the function does not take a list as input, instead takes a numeric vector as input. Note: numShrinkFunc can be a single function, or can be a named list of function calls, whose names match the colnames of values. If a named list is provided, the first entry is used for any any names in values which are not in names(numShrinkFunc).

addStringCols

character vector, optional, of numeric colnames that should be treated as string values. This option is a subset of the numAsStrings.

type, ignore.strand, select

arguments sent to GenomicRanges::findOverlaps(). Note these options are only used when grOL is not supplied.

sep

character value indicating the delimiter to use between string values.

verbose

logical indicating whether to print verbose output.

DEBUG

logical indicating whether to print debugging output.

...

additional arguments are ignored.

Details

This function adds annotations to features in the given GRanges object, from overlapping features in a second GRanges object. It is especially useful after performing a manipulation that results in a GRanges object without any values annotation, for example GenomicRanges::reduce() or GenomicRanges::intersect().

In theory this function is relatively simple, it applies annotations to overlapping entries. In practice, it gets complicated when multiple annotations overlap the same GRange entry. In that case, numerical values by default return the base::mean() while string values call jamba::cPasteUnique() by default, and return the unique, sorted, comma-delimited set of string values. For example, overlapping several exons from the same gene, the resulting annotation might include just one gene symbol.

The numeric values can be aggregated using another function, controlled by numShrinkFunction named using the colname. For example: numShrinkFunction="min" would return the base::min() value for all numeric columns. But numShrinkFunction=list(default=mean, score=max) would return the base::mean() for all numeric columns except the "score" column which would report the base::max().

Numeric values currently use the data.table package workflow, which provides extremely efficient calculations for numeric values. However, vectorized functions have the potential to be notably faster, as is true with jamba::cPasteUnique() especially when applying jamba::mixedSort() to sort alphanumeric values. In future, the implementation may change to accomodate vectorized numeric functions, instead of using data.table.

TODO: This function is a specific case where another function shrinkDataFrame() may be a more general purpose use. That function is yet to be ported to the Jam package suite.

Value

GRanges object with colnames added to values, with length and order equal to the input GR1 GRanges object.

See Also

Other jam GRanges functions: addGRLgaps(), addGRgaps(), annotateGRLfromGRL(), assignGRLexonNames(), closestExonToJunctions(), combineGRcoverage(), exoncov2polygon(), findOverlapsGRL(), flattenExonsBy(), getFirstStrandedFromGRL(), getGRLgaps(), getGRcoverageFromBw(), getGRgaps(), grl2df(), jam_isDisjoint(), make_ref2compressed(), sortGRL(), spliceGR2junctionDF(), stackJunctions()

Examples

gr12 <- GenomicRanges::GRanges(
   seqnames=rep(c("chr1", "chr2", "chr1"), c(3,3,3)),
   ranges=IRanges::IRanges(
      start=c(100, 200, 400, 500, 300, 100, 200, 400, 600),
      width=c(100,150,50, 50,50,100, 50,200,50)
   ),
   strand=rep(c("+", "-", "+"), c(3,3,3)),
   gene_name=rep(c("GeneA", "GeneB", "GeneC"), each=3)
)
gr1 <- gr12[,0];

# Say for example you have a GRanges object with no annotation
gr1;

# And had another GRanges object with annotations
gr12;

# To add annotations
annotateGRfromGR(gr1, gr12);

# Notice certain features overlap multiple annotations,
# which may be acceptable.

# If you want to keep annotations distinct,
# use annotateGRLfromGRL()
grl1 <- GenomicRanges::split(gr12[,0],
   GenomicRanges::values(gr12)$gene_name);
grl2 <- GenomicRanges::split(gr12,
   GenomicRanges::values(gr12)$gene_name);

# The first object is a GRangesList with no annotations
grl1;

# The second object is a GRangesList with annotation,
# assumed to be in the same order
grl2;

annotateGRLfromGRL(grl1, grl2);


jmw86069/splicejam documentation built on Nov. 4, 2024, 10:53 a.m.