gvtrack.create: Creates a new virtual track

View source: R/vtrack.R

gvtrack.createR Documentation

Creates a new virtual track

Description

Creates a new virtual track.

Usage

gvtrack.create(
  vtrack = NULL,
  src = NULL,
  func = NULL,
  params = NULL,
  dim = NULL,
  sshift = NULL,
  eshift = NULL,
  filter = NULL,
  ...
)

Arguments

vtrack

virtual track name

src

source (track/intervals). NULL for PWM functions. For value-based tracks, provide a data frame with columns chrom, start, end, and one numeric value column. The data frame functions as an in-memory sparse track and supports all track-based summarizer functions. Intervals must not overlap.

func

function name (see above)

params

function parameters (see above)

dim

use 'NULL' or '0' for 1D iterators. '1' converts 2D iterator to (chrom1, start1, end1) , '2' converts 2D iterator to (chrom2, start2, end2)

sshift

shift of 'start' coordinate

eshift

shift of 'end' coordinate

filter

genomic mask to apply. Can be:

  • A data.frame with columns 'chrom', 'start', 'end' (intervals to mask)

  • A character string naming an intervals set

  • A character string naming a track (must be intervals-type track)

  • A list of any combination of the above (all will be unified)

  • NULL to clear the filter

...

additional PWM parameters

Details

This function creates a new virtual track named 'vtrack' with the given source, function and parameters. 'src' can be either a track, intervals (1D or 2D), or a data frame with intervals and a numeric value column (value-based track). The tables below summarize the supported combinations.

Value-based tracks Value-based tracks are data frames containing genomic intervals with associated numeric values. They function as in-memory sparse tracks without requiring track creation in the database. To create a value-based track, provide a data frame with columns chrom, start, end, and one numeric value column (any name is acceptable). Value-based tracks support all track-based summarizer functions (e.g., avg, min, max, sum, stddev, quantile, nearest, exists, size, first, last, sample, and position functions), but do not support overlapping intervals. They behave like sparse tracks in aggregation: values are aggregated using count-based averaging (each interval contributes equally regardless of length), not coverage-based averaging.

Track-based summarizers

Source func params Description
Track avg NULL Average track value in the iterator interval.
Track (1D) exists vals (optional) Returns 1 if any value exists (or specific vals if provided), 0 otherwise.
Track (1D) first NULL First value in the iterator interval.
Track (1D) last NULL Last value in the iterator interval.
Track max NULL Maximum track value in the iterator interval.
Track min NULL Minimum track value in the iterator interval.
Dense / Sparse / Array track nearest NULL Average value inside the iterator; for sparse tracks with no samples in the interval, falls back to the closest sample outside the interval (by genomic distance).
Track (1D) sample NULL Uniformly sampled source value from the iterator interval.
Track (1D) size NULL Number of non-NaN values in the iterator interval.
Dense / Sparse / Array track stddev NULL Unbiased standard deviation of values in the iterator interval.
Dense / Sparse / Array track sum NULL Sum of values in the iterator interval.
Dense / Sparse / Array track quantile Percentile in [0, 1] Quantile of values in the iterator interval.
Dense track global.percentile NULL Percentile of the interval average relative to the full-track distribution.
Dense track global.percentile.max NULL Percentile of the interval maximum relative to the full-track distribution.
Dense track global.percentile.min NULL Percentile of the interval minimum relative to the full-track distribution.

Track position summarizers

Source func params Description
Track (1D) first.pos.abs NULL Absolute genomic coordinate of the first value.
Track (1D) first.pos.relative NULL Zero-based position (relative to interval start) of the first value.
Track (1D) last.pos.abs NULL Absolute genomic coordinate of the last value.
Track (1D) last.pos.relative NULL Zero-based position (relative to interval start) of the last value.
Track (1D) max.pos.abs NULL Absolute genomic coordinate of the maximum value inside the iterator interval.
Track (1D) max.pos.relative NULL Zero-based position (relative to interval start) of the maximum value.
Track (1D) min.pos.abs NULL Absolute genomic coordinate of the minimum value inside the iterator interval.
Track (1D) min.pos.relative NULL Zero-based position (relative to interval start) of the minimum value.
Track (1D) sample.pos.abs NULL Absolute genomic coordinate of a uniformly sampled value.
Track (1D) sample.pos.relative NULL Zero-based position (relative to interval start) of a uniformly sampled value.

For max.pos.relative, min.pos.relative, first.pos.relative, last.pos.relative, sample.pos.relative, iterator modifiers (including sshift / eshift and 1D projections generated via gvtrack.iterator) are applied before the position is reported. In other words, the returned coordinate is always 0-based and measured from the start of the iterator interval after all modifier adjustments.

Interval-based summarizers

Source func params Description
1D intervals distance Minimal distance from center (default 0) Signed distance using normalized formula when inside intervals, distance to edge when outside; see notes below for exact formula.
1D intervals distance.center NULL Distance from iterator center to the closest interval center, NA if outside all intervals.
1D intervals distance.edge NULL Edge-to-edge distance from iterator interval to closest source interval (like gintervals.neighbors); see notes below for strand handling.
1D intervals coverage NULL Fraction of iterator length covered by source intervals (after unifying overlaps).
1D intervals neighbor.count Max distance (>= 0) Number of source intervals whose edge-to-edge distance from the iterator interval is within params (no unification).

2D track summarizers

Source func params Description
2D track area NULL Area covered by intersections of track rectangles with the iterator interval.
2D track weighted.sum NULL Weighted sum of values where each weight equals the intersection area.

Motif (PWM) summarizers

Source func Key params Description
NULL (sequence) pwm pssm, bidirect, prior, extend, spat_* Log-sum-exp score of motif likelihoods across all anchors inside the iterator interval.
NULL (sequence) pwm.max pssm, bidirect, prior, extend, spat_* Maximum log-likelihood score among all anchors (per-position union across strands).
NULL (sequence) pwm.max.pos pssm, bidirect, prior, extend, spat_* 1-based position of the best-scoring anchor (signed by strand when bidirect = TRUE); coordinates are always relative to the iterator interval after any gvtrack.iterator() shifts/extensions.
NULL (sequence) pwm.count pssm, score.thresh, bidirect, prior, extend, strand, spat_* Count of anchors whose score exceeds score.thresh (per-position union).

K-mer summarizers

Source func Key params Description
NULL (sequence) kmer.count kmer, extend, strand Number of k-mer occurrences whose anchor lies inside the iterator interval.
NULL (sequence) kmer.frac kmer, extend, strand Fraction of possible anchors within the interval that match the k-mer.

Masked sequence summarizers

Source func Key params Description
NULL (sequence) masked.count NULL Number of masked (lowercase) base pairs in the iterator interval.
NULL (sequence) masked.frac NULL Fraction of base pairs in the iterator interval that are masked (lowercase).

The sections below provide additional notes for motif, interval, k-mer, and masked sequence functions.

Motif (PWM) notes

  • pssm: Position-specific scoring matrix (matrix or data frame) with columns A, C, G, T; extra columns are ignored.

  • bidirect: When TRUE (default), both strands are scanned and combined per genomic start (per-position union). The strand argument is ignored. When FALSE, only the strand specified by strand is scanned.

  • prior: Pseudocount added to frequencies (default 0.01). Set to 0 to disable.

  • extend: Extends the fetched sequence so boundary-anchored motifs retain full context (default TRUE). The END coordinate is padded by motif_length - 1 for all strand modes; anchors must still start inside the iterator.

  • Neutral characters (N, n, *) contribute the mean log-probability of the corresponding PSSM column on both strands.

  • strand: Used only when bidirect = FALSE; 1 scans the forward strand, -1 scans the reverse strand. For pwm.max.pos, strand = -1 reports the hit position at the end of the match (still relative to the forward orientation).

  • score.thresh: Threshold for pwm.count. Anchors with log-likelihood >= score.thresh are counted; only one count per genomic start.

  • Spatial weighting (spat_factor, spat_bin, spat_min, spat_max): optional position-dependent weights applied in log-space. Provide a positive numeric vector spat_factor; spat_bin (integer > 0) defines bin width; spat_min/spat_max restrict the scanning window.

  • pwm.max.pos: Positions are reported 1-based relative to the final scan window (after iterator shifts and spatial trimming). Ties resolve to the most 5' anchor; the forward strand wins ties at the same coordinate. Values are signed when bidirect = TRUE (positive for forward, negative for reverse).

Spatial weighting enables position-dependent weighting for modeling positional biases. Bins are 0-indexed from the scan start. When using gvtrack.iterator() shifts (e.g., sshift = -50, eshift = 50), bins index from the expanded scan window start, not the original interval. Both strands use the same bin at each genomic position. Positions beyond the last bin reuse the final bin's weight. If the window size is not divisible by spat_bin, the last bin is shorter (e.g., scanning 500 bp with 40 bp bins yields bins 0-11 of 40 bp plus bin 12 of 20 bp). Use spat_min and spat_max to restrict scanning to a range divisible by spat_bin if needed.

PWM parameters can be supplied either as a single list (params) or via named arguments (see examples).

Interval distance notes

distance: Given the center 'C' of the current iterator interval, returns 'DC * X/2' where 'DC' is the normalized distance to the center of the interval that contains 'C', and 'X' is the value of the parameter (default: 0). If no interval contains 'C', the result is 'D + X/2' where 'D' is the distance between 'C' and the edge of the closest interval.

distance.center: Given the center 'C' of the current iterator interval, returns NaN if 'C' is outside of all intervals, otherwise returns the distance between 'C' and the center of the closest interval.

distance.edge: Computes edge-to-edge distance from the iterator interval to the closest source interval, using the same calculation as gintervals.neighbors. Returns 0 for overlapping intervals. Distance sign depends on the strand column of source intervals; returns unsigned (absolute) distance if no strand column exists. Returns NA if no source intervals exist on the current chromosome.

For distance and distance.center, distance can be positive or negative depending on the position of the coordinate relative to the interval and the strand (-1 or 1) of the interval. Distance is always positive if strand = 0 or if the strand column is missing. The result is NA if no intervals exist for the current chromosome.

Difference between distance functions: The distance function measures from the center of the iterator interval (a single coordinate point) to the closest edge of source intervals when outside, or returns a normalized distance within the interval when inside. The distance.center function measures from the center of the iterator interval to the center of source intervals. The distance.edge function measures edge-to-edge distance between intervals, exactly like gintervals.neighbors. Use distance.edge when you need the same distance computation as gintervals.neighbors within a virtual track context.

K-mer notes

  • kmer: DNA sequence (case-insensitive) to count.

  • extend: If TRUE (default), counts kmers whose anchor lies in the interval even if the kmer extends beyond it; when FALSE, only kmers fully contained in the interval are considered.

  • strand: 1 counts forward-strand occurrences, -1 counts reverse-strand occurrences, 0 counts both strands (default). For palindromic kmers, consider using 1 or -1 to avoid double counting.

K-mer parameters can be supplied as a list or via named arguments (see examples).

Modify iterator behavior with 'gvtrack.iterator' or 'gvtrack.iterator.2d'.

Value

None.

See Also

gvtrack.info, gvtrack.iterator, gvtrack.iterator.2d, gvtrack.array.slice, gvtrack.ls, gvtrack.rm

gvtrack.iterator, gvtrack.iterator.2d, gvtrack.filter

Examples



gdb.init_examples()

gvtrack.create("vtrack1", "dense_track", "max")
gvtrack.create("vtrack2", "dense_track", "quantile", 0.5)
gextract("dense_track", "vtrack1", "vtrack2",
    gintervals(1, 0, 10000),
    iterator = 1000
)

gvtrack.create("vtrack3", "dense_track", "global.percentile")
gvtrack.create("vtrack4", "annotations", "distance")
gdist(
    "vtrack3", seq(0, 1, l = 10), "vtrack4",
    seq(-500, 500, 200)
)

gvtrack.create("cov", "annotations", "coverage")
gextract("cov", gintervals(1, 0, 1000), iterator = 100)

pssm <- matrix(
    c(
        0.7, 0.1, 0.1, 0.1, # Example PSSM
        0.1, 0.7, 0.1, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1
    ),
    ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
gvtrack.create(
    "motif_score", NULL, "pwm",
    list(pssm = pssm, bidirect = TRUE, prior = 0.01)
)
gvtrack.create("max_motif_score", NULL, "pwm.max",
    pssm = pssm, bidirect = TRUE, prior = 0.01
)
gvtrack.create("max_motif_pos", NULL, "pwm.max.pos",
    pssm = pssm
)
gextract(
    c(
        "dense_track", "motif_score", "max_motif_score",
        "max_motif_pos"
    ),
    gintervals(1, 0, 10000),
    iterator = 500
)

# Kmer counting examples
gvtrack.create("cg_count", NULL, "kmer.count", kmer = "CG", strand = 1)
gvtrack.create("cg_frac", NULL, "kmer.frac", kmer = "CG", strand = 1)
gextract(c("cg_count", "cg_frac"), gintervals(1, 0, 10000), iterator = 1000)

gvtrack.create("at_pos", NULL, "kmer.count", kmer = "AT", strand = 1)
gvtrack.create("at_neg", NULL, "kmer.count", kmer = "AT", strand = -1)
gvtrack.create("at_both", NULL, "kmer.count", kmer = "AT", strand = 0)
gextract(c("at_pos", "at_neg", "at_both"), gintervals(1, 0, 10000), iterator = 1000)

# GC content
gvtrack.create("g_frac", NULL, "kmer.frac", kmer = "G")
gvtrack.create("c_frac", NULL, "kmer.frac", kmer = "C")
gextract("g_frac + c_frac", gintervals(1, 0, 10000),
    iterator = 1000,
    colnames = "gc_content"
)

# Masked base pair counting
gvtrack.create("masked_count", NULL, "masked.count")
gvtrack.create("masked_frac", NULL, "masked.frac")
gextract(c("masked_count", "masked_frac"), gintervals(1, 0, 10000), iterator = 1000)

# Combined with GC content (unmasked regions only)
gvtrack.create("gc", NULL, "kmer.frac", kmer = "G")
gextract("gc * (1 - masked_frac)",
    gintervals(1, 0, 10000),
    iterator = 1000,
    colnames = "gc_unmasked"
)

# Value-based track examples
# Create a data frame with intervals and numeric values
intervals_with_values <- data.frame(
    chrom = "chr1",
    start = c(100, 300, 500),
    end = c(200, 400, 600),
    score = c(10, 20, 30)
)
# Use as value-based sparse track (functions like sparse track)
gvtrack.create("value_track", intervals_with_values, "avg")
gvtrack.create("value_track_max", intervals_with_values, "max")
gextract(c("value_track", "value_track_max"),
    gintervals(1, 0, 10000),
    iterator = 1000
)

# Spatial PWM examples
# Create a PWM with higher weight in the center of intervals
pssm <- matrix(
    c(
        0.7, 0.1, 0.1, 0.1,
        0.1, 0.7, 0.1, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.1, 0.7
    ),
    ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")

# Spatial factors: low weight at edges, high in center
# For 200bp intervals with 40bp bins: bins 0, 40, 80, 120, 160
spatial_weights <- c(0.5, 1.0, 2.0, 1.0, 0.5)

gvtrack.create(
    "spatial_pwm", NULL, "pwm",
    list(
        pssm = pssm,
        bidirect = TRUE,
        spat_factor = spatial_weights,
        spat_bin = 40L
    )
)

# Compare with non-spatial PWM
gvtrack.create(
    "regular_pwm", NULL, "pwm",
    list(pssm = pssm, bidirect = TRUE)
)

gextract(c("spatial_pwm", "regular_pwm"),
    gintervals(1, 0, 10000),
    iterator = 200
)

# Using spatial parameters with iterator shifts
gvtrack.create(
    "spatial_extended", NULL, "pwm.max",
    pssm = pssm,
    spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5),
    spat_bin = 40L
)
# Scan window will be 280bp (100bp + 2*90bp)
gvtrack.iterator("spatial_extended", sshift = -90, eshift = 90)
gextract("spatial_extended", gintervals(1, 0, 10000), iterator = 100)

# Using spat_min/spat_max to restrict scanning to a window
# For 500bp intervals, scan only positions 30-470 (440bp window)
gvtrack.create(
    "window_pwm", NULL, "pwm",
    pssm = pssm,
    bidirect = TRUE,
    spat_min = 30, # 1-based position
    spat_max = 470 # 1-based position
)
gextract("window_pwm", gintervals(1, 0, 10000), iterator = 500)

# Combining spatial weighting with window restriction
# Scan positions 50-450 with spatial weights favoring the center
gvtrack.create(
    "window_spatial_pwm", NULL, "pwm",
    pssm = pssm,
    bidirect = TRUE,
    spat_factor = c(0.5, 1.0, 2.0, 2.5, 2.0, 1.0, 0.5, 1.0, 0.5, 0.5),
    spat_bin = 40L,
    spat_min = 50,
    spat_max = 450
)
gextract("window_spatial_pwm", gintervals(1, 0, 10000), iterator = 500)

misha documentation built on Dec. 14, 2025, 9:06 a.m.