bedtools_jaccard: bedtools_jaccard

View source: R/jaccard.R

bedtools_jaccardR Documentation

bedtools_jaccard

Description

Compare two sets of genomic regions using the Jaccard statistic, defined as the total width of the intersection, divided by the total width of the union.

Usage

bedtools_jaccard(cmd = "--help")
R_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                   s = FALSE, S = FALSE, split = FALSE)
do_bedtools_jaccard(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                    s = FALSE, S = FALSE, split = FALSE)

Arguments

cmd

String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.

a

Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format.

b

Like a, except supports multiple datasets, either as a vector/list or a comma-separated string. Also supports file glob patterns, i.e., strings containing the wildcard, “*”.

f

Minimum overlap required as a fraction of a [default: any overlap].

F

Minimum overlap required as a fraction of b [default: any overlap].

r

Require that the fraction of overlap be reciprocal for a and b. In other words, if f is 0.90 and r is TRUE, this requires that b overlap at least 90% of a and that a also overlaps at least 90% of b.

e

Require that the minimum fraction be satisfied for a OR b. In other words, if e is TRUE with f=0.90 and F=0.10 this requires that either 90% of a is covered OR 10% of b is covered. If FALSE, both fractions would have to be satisfied.

s

Require same strandedness. That is, find the jaccard feature in b that overlaps a on the same strand. By default, overlaps are reported without respect to strand. Note that this is the exact opposite of Bioconductor behavior.

S

Require opposite strandedness. That is, find the jaccard feature in b that overlaps a on the opposite strand. By default, overlaps are reported without respect to strand.

split

Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects.

Details

As with all commands, there are three interfaces to the jaccard command:

bedtools_jaccard

Parses the bedtools command line and compiles it to the equivalent R code.

R_bedtools_jaccard

Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.

do_bedtools_jaccard

Evaluates the result of R_bedtools_jaccard. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

This is mostly just intersect and union, except when fractional overlap restrictions are involved.

Value

A language object containing the compiled R code, evaluating to a a DataFrame with four columns:

intersection

total width of intersection

union

total width of union

jaccard

the jaccard statistic

n_intersections

the number of ranges representing the intersection

Author(s)

Michael Lawrence

References

http://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html

See Also

setops-methods for set operations including intersect and union.

Examples

## Not run: 
setwd(system.file("unitTests", "data", "jaccard", package="HelloRanges"))

## End(Not run)

## basic
bedtools_jaccard("-a a.bed -b a.bed")
## excluding the gaps in compound ranges
bedtools_jaccard("-a three_blocks_match.bed -b e.bed -split")
## strand and fractional overlap restriction
bedtools_jaccard("-a aMixedStrands.bed -b bMixedStrands.bed -s -f 0.8")

lawremi/HelloRanges documentation built on Oct. 29, 2023, 4:08 p.m.