bedtools_intersect: bedtools_intersect

View source: R/intersect.R

bedtools_intersectR Documentation

bedtools_intersect

Description

Finds and/or counts the intersections between two ranged datasets.

Usage

bedtools_intersect(cmd = "--help")
R_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE,
                     loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE,
                     c = FALSE, v = FALSE, f = 1e-09, F = 1e-09,
                     r = FALSE, e = FALSE, s = FALSE, S = FALSE,
                     split = FALSE, g = NA_character_, header = FALSE,
                     names = NULL, filenames = FALSE, sortout = FALSE)
do_bedtools_intersect(a, b, ubam = FALSE, bed = FALSE, wa = FALSE, wb = FALSE,
                      loj = FALSE, wo = FALSE, wao = FALSE, u = FALSE,
                      c = FALSE, v = FALSE, f = 1e-09, F = 1e-09,
                      r = FALSE, e = FALSE, s = FALSE, S = FALSE,
                      split = FALSE, g = NA_character_, header = FALSE,
                      names = NULL, filenames = FALSE, sortout = FALSE)

Arguments

cmd

String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.

a

Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges. Each feature in a is compared to b in search of overlaps. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format.

b

Like a, except supports multiple datasets, either as a vector/list or a comma-separated string. Also supports file glob patterns, i.e., strings containing the wildcard, “*”.

ubam

Not supported yet. Write uncompressed BAM output. The default is write compressed BAM output.

bed

When using BAM input, return a GRanges (with a “blocks” column) instead of a GAlignments. For VCF input, return a VRanges instead of a VCF object.

wa

Return the original entry in a for each overlap.

wb

Return the original entry in b for each overlap.

loj

Perform a ‘left outer join’. That is, for each feature in a report each overlap with b. If no overlaps are found, report an empty range for b on the “.” sequence. Implies wa=TRUE and wb=TRUE.

wo

Return the number of base pairs of overlap between the two features as the “overlap_width” metadata column. Implies wa=TRUE and wb=TRUE.

wao

Like wo, except it additionally implies loj=TRUE.

u

Like wa, except only the unique entries in a are returned.

c

Like wa, except also count the number of hits in b for each range in a and return the count as the “overlap_count” metadata column.

v

Like wa, except only report those entries in a that have no overlap in b.

f

Minimum overlap required as a fraction of a [default: any overlap].

F

Minimum overlap required as a fraction of b [default: any overlap].

r

Require that the fraction of overlap be reciprocal for a and b. In other words, if f is 0.90 and r is TRUE, this requires that b overlap at least 90% of a and that a also overlaps at least 90% of b.

e

Require that the minimum fraction be satisfied for a OR b. In other words, if e is TRUE with f=0.90 and F=0.10 this requires that either 90% of a is covered OR 10% of b is covered. If FALSE, both fractions would have to be satisfied.

s

Require same strandedness. That is, find the intersect feature in b that overlaps a on the same strand. By default, overlaps are reported without respect to strand. Note that this is the exact opposite of Bioconductor behavior.

S

Require opposite strandedness. That is, find the intersect feature in b that overlaps a on the opposite strand. By default, overlaps are reported without respect to strand.

split

Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects.

g

A genome file, identifier or Seqinfo object that defines the order and size of the sequences.

header

Ignored.

names

When using multiple databases, provide an alias for each to use instead of their integer index. If a single string, can be comma-separated.

filenames

When using multiple databases, use their complete filename instead of their integer index.

sortout

Sort the result by genomic coordinate.

Details

As with all commands, there are three interfaces to the intersect command:

bedtools_intersect

Parses the bedtools command line and compiles it to the equivalent R code.

R_bedtools_intersect

Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.

do_bedtools_intersect

Evaluates the result of R_bedtools_intersect. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

This is by far the most complex bedtools command, and it offers a dizzying list of parameters, many of which are redundant or mutually exclusive. The complexity of the generated code is highest when using the fractional restriction feature, since no such support exists in the GenomicRanges overlap routines.

Value

A language object containing the compiled R code, evaluating to a ranges object, the exact type of which depends on the arguments. If both wa and wb are TRUE, return a Pairs object with the original, matched up ranges, possibly with metadata columns. By default, the return value is a GAlignments for BAM input, a VCF object for VCF input, or a GRanges for any other type of input. If bed is TRUE, BAM input is converted to a GRanges, containing a “blocks” column (encoding the junctions) if the input is BAM. If the input is VCF, bed=TRUE converts the input to a VRanges.

Author(s)

Michael Lawrence

References

http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

See Also

setops-methods for set operations including intersect, findOverlaps-methods for different ways to detect overlaps.

Examples

## Not run: 
setwd(system.file("unitTests", "data", "intersect", package="HelloRanges"))

## End(Not run)

## return intersecting ranges
bedtools_intersect("-a a.bed -b a.bed")
## add count
bedtools_intersect("-a a.bed -b b.bed -c")
## restrict by strand and fraction of overlap
bedtools_intersect("-a a.bed -b b.bed -c -s -f 0.1")
## return original 'a' ranges
bedtools_intersect("-a a.bed -b b.bed -wa")
## return both 'a' and 'b' ranges, along with overlap widths
bedtools_intersect("-a a.bed -b b.bed -wo")
## same as above, except left outer join
bedtools_intersect("-a a.bed -b b.bed -wao")
## consider read junction structure
bedtools_intersect("-a three_blocks.bam -b three_blocks_nomatch.bed -split")

lawremi/HelloRanges documentation built on Oct. 29, 2023, 4:08 p.m.