bedtools_subtract

Description

Subtracts one set of ranges from another, either by position or range.

Usage

1
2
3
4
5
6
7
bedtools_subtract(cmd = "--help")
R_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                    s = FALSE, S = FALSE, A = FALSE, N = FALSE,
                    g = NA_character_)
do_bedtools_subtract(a, b, f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                     s = FALSE, S = FALSE, A = FALSE, N = FALSE,
                     g = NA_character_)

Arguments

cmd

String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.

a

Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges. Each feature in a is compared to b in search of overlaps. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format.

b

Like a, except supports multiple datasets, either as a vector/list or a comma-separated string. Also supports file glob patterns, i.e., strings containing the wildcard, “*”.

f

Minimum overlap required as a fraction of a [default: any overlap].

F

Minimum overlap required as a fraction of b [default: any overlap].

r

Require that the fraction of overlap be reciprocal for a and b. In other words, if f is 0.90 and r is TRUE, this requires that b overlap at least 90% of a and that a also overlaps at least 90% of b.

e

Require that the minimum fraction be satisfied for a OR b. In other words, if e is TRUE with f=0.90 and F=0.10 this requires that either 90% of a is covered OR 10% of b is covered. If FALSE, both fractions would have to be satisfied.

s

Require same strandedness. That is, find the subtract feature in b that overlaps a on the same strand. By default, overlaps are reported without respect to strand. Note that this is the exact opposite of Bioconductor behavior.

S

Require opposite strandedness. That is, find the subtract feature in b that overlaps a on the opposite strand. By default, overlaps are reported without respect to strand.

A

Remove entire feature if any overlap. If a feature in a overlaps one in b, the entire feature is removed.

N

Same as A=TRUE except when considering f the numerator in the fraction is the sum of the overlap for all overlapping features in b.

g

A genome file, identifier or Seqinfo object that defines the order and size of the sequences.

Details

As with all commands, there are three interfaces to the subtract command:

bedtools_subtract

Parses the bedtools command line and compiles it to the equivalent R code.

R_bedtools_subtract

Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.

do_bedtools_subtract

Evaluates the result of R_bedtools_subtract. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

We typically subtract sets of ranges using setdiff; however, that will not work here, because we cannot merge the ranges in a.

The algorithm has two modes: by position (where ranges are clipped) and by range (where ranges are discarded entirely). The position mode is the default. We find overlaps, optionally restrict them, and for each range in a, we subtract all of the qualifying intersections in b.

When A or N are TRUE, we use the second mode. In the simplest case, that is just subsetByOverlaps with invert=TRUE, but fractional overlap restrictions and N make that more complicated.

Value

A language object containing the compiled R code, evaluating to a GRanges object, except when A or N are TRUE, where the value might be a GRanges, GAlignments or VCF object, depending on the input.

Author(s)

Michael Lawrence

References

http://bedtools.readthedocs.io/en/latest/content/tools/subtract.html

See Also

setops-methods for set operations including setdiff, findOverlaps-methods for different ways to detect overlaps.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
setwd(system.file("unitTests", "data", "subtract", package="HelloRanges"))

## End(Not run)

## simple case, position-wise subtraction
bedtools_subtract("-a a.bed -b b.bed")
## fractional overlap restriction
bedtools_subtract("-a a.bed -b b.bed -f 0.5")
## range-wise subtraction
bedtools_subtract("-a a.bed -b b.bed -A -f 0.5")