bed_jaccard: Calculate the Jaccard statistic for two sets of intervals. In rnabioco/valr: Genome Interval Arithmetic in R

Description

Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.

Usage

 1 bed_jaccard(x, y) 

Arguments

 x tbl_interval() y tbl_interval()

Details

The Jaccard statistic takes values of [0,1] and is measured as:

J(x,y) = \frac{\mid x \bigcap y \mid} {\mid x \bigcup y \mid} = \frac{\mid x \bigcap y \mid} {\mid x \mid + \mid y \mid - \mid x \bigcap y \mid}

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

tibble with the following columns:

• len_i length of the intersection in base-pairs

• len_u length of the union in base-pairs

• jaccard value of jaccard statistic

• n_int number of intersecting intervals between x and y

If inputs are grouped, the return value will contain one set of values per group.

Other interval statistics: bed_absdist, bed_fisher, bed_projection, bed_reldist
  1 2 3 4 5 6 7 8 9 10 genome <- read_genome(valr_example('hg19.chrom.sizes.gz')) x <- bed_random(genome, seed = 1010486) y <- bed_random(genome, seed = 9203911) bed_jaccard(x, y) # calculate jaccard per chromosome bed_jaccard(dplyr::group_by(x, chrom), dplyr::group_by(y, chrom))