| gvtrack.create | R Documentation |
Creates a new virtual track.
gvtrack.create(
vtrack = NULL,
src = NULL,
func = NULL,
params = NULL,
dim = NULL,
sshift = NULL,
eshift = NULL,
filter = NULL,
...
)
vtrack |
virtual track name |
src |
source (track/intervals). NULL for PWM functions. For value-based
tracks, provide a data frame with columns |
func |
function name (see above) |
params |
function parameters (see above) |
dim |
use 'NULL' or '0' for 1D iterators. '1' converts 2D iterator to (chrom1, start1, end1) , '2' converts 2D iterator to (chrom2, start2, end2) |
sshift |
shift of 'start' coordinate |
eshift |
shift of 'end' coordinate |
filter |
genomic mask to apply. Can be:
|
... |
additional PWM parameters |
This function creates a new virtual track named 'vtrack' with the given source, function and parameters. 'src' can be either a track, intervals (1D or 2D), or a data frame with intervals and a numeric value column (value-based track). The tables below summarize the supported combinations.
Value-based tracks
Value-based tracks are data frames containing genomic intervals with associated
numeric values. They function as in-memory sparse tracks without requiring
track creation in the database. To create a value-based track, provide a data
frame with columns chrom, start, end, and one numeric
value column (any name is acceptable). Value-based tracks support all track-based
summarizer functions (e.g., avg, min, max, sum,
stddev, quantile, nearest, exists, size,
first, last, sample, and position functions), but do not
support overlapping intervals. They behave like sparse tracks in aggregation:
values are aggregated using count-based averaging (each interval contributes equally
regardless of length), not coverage-based averaging.
Track-based summarizers
| Source | func | params | Description |
| Track | avg | NULL | Average track value in the iterator interval. |
| Track (1D) | exists | vals (optional) | Returns 1 if any value exists (or specific vals if provided), 0 otherwise. |
| Track (1D) | first | NULL | First value in the iterator interval. |
| Track (1D) | last | NULL | Last value in the iterator interval. |
| Track | max | NULL | Maximum track value in the iterator interval. |
| Track | min | NULL | Minimum track value in the iterator interval. |
| Dense / Sparse / Array track | nearest | NULL | Average value inside the iterator; for sparse tracks with no samples in the interval, falls back to the closest sample outside the interval (by genomic distance). |
| Track (1D) | sample | NULL | Uniformly sampled source value from the iterator interval. |
| Track (1D) | size | NULL | Number of non-NaN values in the iterator interval. |
| Dense / Sparse / Array track | stddev | NULL | Unbiased standard deviation of values in the iterator interval. |
| Dense / Sparse / Array track | sum | NULL | Sum of values in the iterator interval. |
| Dense / Sparse / Array track | quantile | Percentile in [0, 1] | Quantile of values in the iterator interval. |
| Dense track | global.percentile | NULL | Percentile of the interval average relative to the full-track distribution. |
| Dense track | global.percentile.max | NULL | Percentile of the interval maximum relative to the full-track distribution. |
| Dense track | global.percentile.min | NULL | Percentile of the interval minimum relative to the full-track distribution. |
Track position summarizers
| Source | func | params | Description |
| Track (1D) | first.pos.abs | NULL | Absolute genomic coordinate of the first value. |
| Track (1D) | first.pos.relative | NULL | Zero-based position (relative to interval start) of the first value. |
| Track (1D) | last.pos.abs | NULL | Absolute genomic coordinate of the last value. |
| Track (1D) | last.pos.relative | NULL | Zero-based position (relative to interval start) of the last value. |
| Track (1D) | max.pos.abs | NULL | Absolute genomic coordinate of the maximum value inside the iterator interval. |
| Track (1D) | max.pos.relative | NULL | Zero-based position (relative to interval start) of the maximum value. |
| Track (1D) | min.pos.abs | NULL | Absolute genomic coordinate of the minimum value inside the iterator interval. |
| Track (1D) | min.pos.relative | NULL | Zero-based position (relative to interval start) of the minimum value. |
| Track (1D) | sample.pos.abs | NULL | Absolute genomic coordinate of a uniformly sampled value. |
| Track (1D) | sample.pos.relative | NULL | Zero-based position (relative to interval start) of a uniformly sampled value. |
For max.pos.relative, min.pos.relative, first.pos.relative, last.pos.relative, sample.pos.relative,
iterator modifiers (including sshift /
eshift and 1D projections generated via gvtrack.iterator) are
applied before the position is reported. In other words, the returned
coordinate is always 0-based and measured from the start of the iterator
interval after all modifier adjustments.
Interval-based summarizers
| Source | func | params | Description |
| 1D intervals | distance | Minimal distance from center (default 0) | Signed distance using normalized formula when inside intervals, distance to edge when outside; see notes below for exact formula. |
| 1D intervals | distance.center | NULL | Distance from iterator center to the closest interval center, NA if outside all intervals. |
| 1D intervals | distance.edge | NULL | Edge-to-edge distance from iterator interval to closest source interval (like gintervals.neighbors); see notes below for strand handling. |
| 1D intervals | coverage | NULL | Fraction of iterator length covered by source intervals (after unifying overlaps). |
| 1D intervals | neighbor.count | Max distance (>= 0) | Number of source intervals whose edge-to-edge distance from the iterator interval is within params (no unification). |
2D track summarizers
| Source | func | params | Description |
| 2D track | area | NULL | Area covered by intersections of track rectangles with the iterator interval. |
| 2D track | weighted.sum | NULL | Weighted sum of values where each weight equals the intersection area. |
Motif (PWM) summarizers
| Source | func | Key params | Description |
| NULL (sequence) | pwm | pssm, bidirect, prior, extend, spat_* | Log-sum-exp score of motif likelihoods across all anchors inside the iterator interval. |
| NULL (sequence) | pwm.max | pssm, bidirect, prior, extend, spat_* | Maximum log-likelihood score among all anchors (per-position union across strands). |
| NULL (sequence) | pwm.max.pos | pssm, bidirect, prior, extend, spat_* | 1-based position of the best-scoring anchor (signed by strand when bidirect = TRUE); coordinates are always relative to the iterator interval after any gvtrack.iterator() shifts/extensions. |
| NULL (sequence) | pwm.count | pssm, score.thresh, bidirect, prior, extend, strand, spat_* | Count of anchors whose score exceeds score.thresh (per-position union). |
K-mer summarizers
| Source | func | Key params | Description |
| NULL (sequence) | kmer.count | kmer, extend, strand | Number of k-mer occurrences whose anchor lies inside the iterator interval. |
| NULL (sequence) | kmer.frac | kmer, extend, strand | Fraction of possible anchors within the interval that match the k-mer. |
Masked sequence summarizers
| Source | func | Key params | Description |
| NULL (sequence) | masked.count | NULL | Number of masked (lowercase) base pairs in the iterator interval. |
| NULL (sequence) | masked.frac | NULL | Fraction of base pairs in the iterator interval that are masked (lowercase). |
The sections below provide additional notes for motif, interval, k-mer, and masked sequence functions.
Motif (PWM) notes
pssm: Position-specific scoring matrix (matrix or data frame) with columns A, C, G, T; extra columns are ignored.
bidirect: When TRUE (default), both strands are scanned and combined per genomic start (per-position union). The strand argument is ignored. When FALSE, only the strand specified by strand is scanned.
prior: Pseudocount added to frequencies (default 0.01). Set to 0 to disable.
extend: Extends the fetched sequence so boundary-anchored motifs retain full context (default TRUE). The END coordinate is padded by motif_length - 1 for all strand modes; anchors must still start inside the iterator.
Neutral characters (N, n, *) contribute the mean log-probability of the corresponding PSSM column on both strands.
strand: Used only when bidirect = FALSE; 1 scans the forward strand, -1 scans the reverse strand. For pwm.max.pos, strand = -1 reports the hit position at the end of the match (still relative to the forward orientation).
score.thresh: Threshold for pwm.count. Anchors with log-likelihood >= score.thresh are counted; only one count per genomic start.
Spatial weighting (spat_factor, spat_bin, spat_min, spat_max): optional position-dependent weights applied in log-space. Provide a positive numeric vector spat_factor; spat_bin (integer > 0) defines bin width; spat_min/spat_max restrict the scanning window.
pwm.max.pos: Positions are reported 1-based relative to the final scan window (after iterator shifts and spatial trimming). Ties resolve to the most 5' anchor; the forward strand wins ties at the same coordinate. Values are signed when bidirect = TRUE (positive for forward, negative for reverse).
Spatial weighting
enables position-dependent weighting for modeling positional biases. Bins are 0-indexed from the
scan start. When using gvtrack.iterator() shifts (e.g., sshift = -50, eshift = 50), bins index from
the expanded scan window start, not the original interval. Both strands use the same bin at each
genomic position. Positions beyond the last bin reuse the final bin's weight. If the window size is
not divisible by spat_bin, the last bin is shorter (e.g., scanning 500 bp with 40 bp bins yields
bins 0-11 of 40 bp plus bin 12 of 20 bp). Use spat_min and spat_max to restrict scanning to a
range divisible by spat_bin if needed.
PWM parameters can be supplied either as a single list (params) or via named arguments (see examples).
Interval distance notes
distance: Given the center 'C' of the current iterator interval, returns 'DC * X/2' where 'DC' is the normalized distance to the center of the interval that contains 'C', and 'X' is the value of the parameter (default: 0). If no interval contains 'C', the result is 'D + X/2' where 'D' is the distance between 'C' and the edge of the closest interval.
distance.center: Given the center 'C' of the current iterator interval, returns NaN if 'C' is outside of all intervals, otherwise returns the distance between 'C' and the center of the closest interval.
distance.edge: Computes edge-to-edge distance from the iterator interval to the closest source interval, using the same calculation as gintervals.neighbors. Returns 0 for overlapping intervals. Distance sign depends on the strand column of source intervals; returns unsigned (absolute) distance if no strand column exists. Returns NA if no source intervals exist on the current chromosome.
For distance and distance.center, distance can be positive or negative depending on the position of the coordinate relative to the interval and the strand (-1 or 1) of the interval. Distance is always positive if strand = 0 or if the strand column is missing. The result is NA if no intervals exist for the current chromosome.
Difference between distance functions: The distance function measures from the center of the iterator interval (a single coordinate point) to the closest edge of source intervals when outside, or returns a normalized distance within the interval when inside. The distance.center function measures from the center of the iterator interval to the center of source intervals. The distance.edge function measures edge-to-edge distance between intervals, exactly like gintervals.neighbors. Use distance.edge when you need the same distance computation as gintervals.neighbors within a virtual track context.
K-mer notes
kmer: DNA sequence (case-insensitive) to count.
extend: If TRUE (default), counts kmers whose anchor lies in the interval even if the kmer extends beyond it; when FALSE, only kmers fully contained in the interval are considered.
strand: 1 counts forward-strand occurrences, -1 counts reverse-strand occurrences, 0 counts both strands (default). For palindromic kmers, consider using 1 or -1 to avoid double counting.
K-mer parameters can be supplied as a list or via named arguments (see examples).
Modify iterator behavior with 'gvtrack.iterator' or 'gvtrack.iterator.2d'.
None.
gvtrack.info, gvtrack.iterator,
gvtrack.iterator.2d, gvtrack.array.slice,
gvtrack.ls, gvtrack.rm
gvtrack.iterator, gvtrack.iterator.2d, gvtrack.filter
gdb.init_examples()
gvtrack.create("vtrack1", "dense_track", "max")
gvtrack.create("vtrack2", "dense_track", "quantile", 0.5)
gextract("dense_track", "vtrack1", "vtrack2",
gintervals(1, 0, 10000),
iterator = 1000
)
gvtrack.create("vtrack3", "dense_track", "global.percentile")
gvtrack.create("vtrack4", "annotations", "distance")
gdist(
"vtrack3", seq(0, 1, l = 10), "vtrack4",
seq(-500, 500, 200)
)
gvtrack.create("cov", "annotations", "coverage")
gextract("cov", gintervals(1, 0, 1000), iterator = 100)
pssm <- matrix(
c(
0.7, 0.1, 0.1, 0.1, # Example PSSM
0.1, 0.7, 0.1, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1
),
ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
gvtrack.create(
"motif_score", NULL, "pwm",
list(pssm = pssm, bidirect = TRUE, prior = 0.01)
)
gvtrack.create("max_motif_score", NULL, "pwm.max",
pssm = pssm, bidirect = TRUE, prior = 0.01
)
gvtrack.create("max_motif_pos", NULL, "pwm.max.pos",
pssm = pssm
)
gextract(
c(
"dense_track", "motif_score", "max_motif_score",
"max_motif_pos"
),
gintervals(1, 0, 10000),
iterator = 500
)
# Kmer counting examples
gvtrack.create("cg_count", NULL, "kmer.count", kmer = "CG", strand = 1)
gvtrack.create("cg_frac", NULL, "kmer.frac", kmer = "CG", strand = 1)
gextract(c("cg_count", "cg_frac"), gintervals(1, 0, 10000), iterator = 1000)
gvtrack.create("at_pos", NULL, "kmer.count", kmer = "AT", strand = 1)
gvtrack.create("at_neg", NULL, "kmer.count", kmer = "AT", strand = -1)
gvtrack.create("at_both", NULL, "kmer.count", kmer = "AT", strand = 0)
gextract(c("at_pos", "at_neg", "at_both"), gintervals(1, 0, 10000), iterator = 1000)
# GC content
gvtrack.create("g_frac", NULL, "kmer.frac", kmer = "G")
gvtrack.create("c_frac", NULL, "kmer.frac", kmer = "C")
gextract("g_frac + c_frac", gintervals(1, 0, 10000),
iterator = 1000,
colnames = "gc_content"
)
# Masked base pair counting
gvtrack.create("masked_count", NULL, "masked.count")
gvtrack.create("masked_frac", NULL, "masked.frac")
gextract(c("masked_count", "masked_frac"), gintervals(1, 0, 10000), iterator = 1000)
# Combined with GC content (unmasked regions only)
gvtrack.create("gc", NULL, "kmer.frac", kmer = "G")
gextract("gc * (1 - masked_frac)",
gintervals(1, 0, 10000),
iterator = 1000,
colnames = "gc_unmasked"
)
# Value-based track examples
# Create a data frame with intervals and numeric values
intervals_with_values <- data.frame(
chrom = "chr1",
start = c(100, 300, 500),
end = c(200, 400, 600),
score = c(10, 20, 30)
)
# Use as value-based sparse track (functions like sparse track)
gvtrack.create("value_track", intervals_with_values, "avg")
gvtrack.create("value_track_max", intervals_with_values, "max")
gextract(c("value_track", "value_track_max"),
gintervals(1, 0, 10000),
iterator = 1000
)
# Spatial PWM examples
# Create a PWM with higher weight in the center of intervals
pssm <- matrix(
c(
0.7, 0.1, 0.1, 0.1,
0.1, 0.7, 0.1, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.1, 0.7
),
ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
# Spatial factors: low weight at edges, high in center
# For 200bp intervals with 40bp bins: bins 0, 40, 80, 120, 160
spatial_weights <- c(0.5, 1.0, 2.0, 1.0, 0.5)
gvtrack.create(
"spatial_pwm", NULL, "pwm",
list(
pssm = pssm,
bidirect = TRUE,
spat_factor = spatial_weights,
spat_bin = 40L
)
)
# Compare with non-spatial PWM
gvtrack.create(
"regular_pwm", NULL, "pwm",
list(pssm = pssm, bidirect = TRUE)
)
gextract(c("spatial_pwm", "regular_pwm"),
gintervals(1, 0, 10000),
iterator = 200
)
# Using spatial parameters with iterator shifts
gvtrack.create(
"spatial_extended", NULL, "pwm.max",
pssm = pssm,
spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5),
spat_bin = 40L
)
# Scan window will be 280bp (100bp + 2*90bp)
gvtrack.iterator("spatial_extended", sshift = -90, eshift = 90)
gextract("spatial_extended", gintervals(1, 0, 10000), iterator = 100)
# Using spat_min/spat_max to restrict scanning to a window
# For 500bp intervals, scan only positions 30-470 (440bp window)
gvtrack.create(
"window_pwm", NULL, "pwm",
pssm = pssm,
bidirect = TRUE,
spat_min = 30, # 1-based position
spat_max = 470 # 1-based position
)
gextract("window_pwm", gintervals(1, 0, 10000), iterator = 500)
# Combining spatial weighting with window restriction
# Scan positions 50-450 with spatial weights favoring the center
gvtrack.create(
"window_spatial_pwm", NULL, "pwm",
pssm = pssm,
bidirect = TRUE,
spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5, 1.0, 0.5, 0.5),
spat_bin = 40L,
spat_min = 50,
spat_max = 450
)
gextract("window_spatial_pwm", gintervals(1, 0, 10000), iterator = 500)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.