# bed_jaccard: Calculate the Jaccard statistic for two sets of intervals. In valr: Genome Interval Arithmetic

 bed_jaccard R Documentation

## Calculate the Jaccard statistic for two sets of intervals.

### Description

Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.

### Usage

bed_jaccard(x, y)

### Arguments

 x ivl_df y ivl_df

### Details

The Jaccard statistic takes values of ⁠[0,1]⁠ and is measured as:

J(x,y) = \frac{\mid x \bigcap y \mid} {\mid x \bigcup y \mid} = \frac{\mid x \bigcap y \mid} {\mid x \mid + \mid y \mid - \mid x \bigcap y \mid}

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

### Value

tibble with the following columns:

• len_i length of the intersection in base-pairs

• len_u length of the union in base-pairs

• jaccard value of jaccard statistic

• n_int number of intersecting intervals between x and y

If inputs are grouped, the return value will contain one set of values per group.

Other interval statistics: bed_absdist(), bed_fisher(), bed_projection(), bed_reldist()

### Examples

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_jaccard(x, y)

# calculate jaccard per chromosome
bed_jaccard(
dplyr::group_by(x, chrom),
dplyr::group_by(y, chrom)
)

valr documentation built on Sept. 19, 2023, 1:07 a.m.