View source: R/intervals-annotation.R
| gintervals.annotate | R Documentation |
Annotates one-dimensional intervals by finding nearest neighbors in another set of intervals and adding selected columns from the neighbors to the original intervals.
gintervals.annotate(
intervals,
annotation_intervals,
annotation_columns = NULL,
column_names = NULL,
dist_column = "dist",
max_dist = Inf,
na_value = NA,
maxneighbors = 1,
tie_method = c("first", "min.start", "min.end"),
overwrite = FALSE,
keep_order = TRUE,
intervals.set.out = NULL,
...
)
intervals |
Intervals to annotate (1D). |
annotation_intervals |
Source intervals containing annotation data (1D). |
annotation_columns |
Character vector of column names to copy from
|
column_names |
Optional custom names for the annotation columns. If
provided, must have the same length as |
dist_column |
Name of the distance column to include. Use |
max_dist |
Maximum absolute distance. When finite, neighbors with
|
na_value |
Value(s) to use for annotations when beyond |
maxneighbors |
Maximum number of neighbors per interval (duplicates intervals as needed). Defaults to 1. |
tie_method |
Tie-breaking when distances are equal: one of
"first" (arbitrary but stable), "min.start" (smaller neighbor start first),
or "min.end" (smaller neighbor end first). Applies when
|
overwrite |
When |
keep_order |
If |
intervals.set.out |
intervals set name where the function result is optionally outputted |
... |
Additional arguments forwarded to |
The function wraps and extends gintervals.neighbors to provide
convenient column selection/renaming, optional distance inclusion, distance
thresholding with custom NA values, multiple neighbors per interval, and
deterministic tie-breaking. Currently supports 1D intervals only.
- When annotation_columns = NULL, all non-basic columns present in
annotation_intervals are included.
- Setting dist_column = NULL omits the distance column.
- If no neighbor is found for an interval, annotation columns are filled with
na_value and the distance (when present) is NA_real_.
- Column name collisions are handled as follows: when overwrite=FALSE
a clear error is emitted; when overwrite=TRUE, base columns with the
same names are replaced by annotation columns.
A data frame containing the original intervals plus the requested
annotation columns (and optional distance column). If
maxneighbors > 1, rows may be duplicated per input interval to
accommodate multiple neighbors.
# Prepare toy data
intervs <- gintervals(1, c(1000, 5000), c(1100, 5050))
ann <- gintervals(1, c(900, 5400), c(950, 5500))
ann$remark <- c("a", "b")
ann$score <- c(10, 20)
# Basic usage with default columns (all non-basic columns)
gintervals.annotate(intervs, ann)
# Select specific columns, with custom names and distance column name
gintervals.annotate(
intervs, ann,
annotation_columns = c("remark"),
column_names = c("ann_remark"),
dist_column = "ann_dist"
)
# Distance threshold with scalar NA replacement
gintervals.annotate(
intervs, ann,
annotation_columns = c("remark"),
max_dist = 200,
na_value = "no_ann"
)
# Multiple neighbors with deterministic tie-breaking
nbrs <- gintervals.annotate(
gintervals(1, 1000, 1100),
{
x <- gintervals(1, c(800, 1200), c(900, 1300))
x$label <- c("left", "right")
x
},
annotation_columns = "label",
maxneighbors = 2,
tie_method = "min.start"
)
nbrs
# Overwrite existing columns in the base intervals
intervs2 <- intervs
intervs2$remark <- c("orig1", "orig2")
gintervals.annotate(intervs2, ann, annotation_columns = "remark", overwrite = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.