bedtools_map: bedtools_map

View source: R/map.R

bedtools_mapR Documentation

bedtools_map

Description

Group ranges by overlap with query ranges and aggregate. By default, the scores are summed.

Usage

bedtools_map(cmd = "--help")
R_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09,
               r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE,
               split = FALSE, g = NA_character_, delim=",")
do_bedtools_map(a, b, c = "5", o = "sum", f = 1e-09, F = 1e-09,
                r = FALSE, e = FALSE, s = FALSE, S = FALSE, header = FALSE,
                split = FALSE, g = NA_character_, delim=",")

Arguments

cmd

String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.

a

Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format. Windows are generated with each range. Exclusive with g. A summary of b is computed for each range.

b

Like a, except supports multiple datasets, either as a vector/list or a comma-separated string. Also supports file glob patterns, i.e., strings containing the wildcard, “*”. Ranges that map to the same range in a are aggregated.

c

Specify columns (by integer index) from the input file to operate upon (see o option, below). Multiple columns can be specified in a comma-delimited list. Defaults to the score column.

o

Specify the operations (by name) that should be applied to the columns indicated in c. Multiple operations can be specified in a comma-delimited list. Recycling is used to align c and o. See bedtools_groupby for the available operations. Defaults to the “sum” operation.

f

Minimum overlap required as a fraction of a [default: any overlap].

F

Minimum overlap required as a fraction of b [default: any overlap].

r

Require that the fraction of overlap be reciprocal for a and b. In other words, if f is 0.90 and r is TRUE, this requires that b overlap at least 90% of a and that a also overlaps at least 90% of b.

e

Require that the minimum fraction be satisfied for a OR b. In other words, if e is TRUE with f=0.90 and F=0.10 this requires that either 90% of a is covered OR 10% of b is covered. If FALSE, both fractions would have to be satisfied.

s

Require same strandedness. That is, find the jaccard feature in b that overlaps a on the same strand. By default, overlaps are reported without respect to strand. Note that this is the exact opposite of Bioconductor behavior.

S

Require opposite strandedness. That is, find the jaccard feature in b that overlaps a on the opposite strand. By default, overlaps are reported without respect to strand.

header

Ignored.

split

Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects.

g

A genome file, identifier or Seqinfo object that defines the order and size of the sequences.

delim

Delimiter character used to collapse strings.

Details

As with all commands, there are three interfaces to the map command:

bedtools_map

Parses the bedtools command line and compiles it to the equivalent R code.

R_bedtools_map

Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.

do_bedtools_map

Evaluates the result of R_bedtools_map. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

Computing overlaps with findOverlaps generates a Hits object, which we can pass directly to aggregate to aggregate the subject features that overlap the same range in the query.

There are several commands in the bedtools suite that might be approximately implemented by passing multiple files to b and specifying the aggregate expression table(b). That counts how many ranges from each database/sample overlap a given query. The covered commands are: bedtools annotate -counts, bedtools multicov and bedtools tag.

Value

A language object containing the compiled R code, evaluating to a DataFrame with a “grouping” column corresponding to as(hits, "List"), and a column for each summary.

Note

We do not support the bedtools null argument, because it seems more sensible to just let R decide on the value of statistics when a group is empty.

Author(s)

Michael Lawrence

References

http://bedtools.readthedocs.io/en/latest/content/tools/map.html

See Also

findOverlaps-methods for finding hits, Hits-class for manipulating them, aggregate-methods for aggregating them.

Examples

## Not run: 
setwd(system.file("unitTests", "data", "map", package="HelloRanges"))

## End(Not run)

## default behavior
bedtools_map("-a ivls.bed -b values.bed")
## take the mode of the scores
bedtools_map("-a ivls.bed -b values.bed -o mode")
## collapse the chromosome names
bedtools_map("-a ivls.bed -b test.gff2 -c 1 -o collapse")
## collapse the names, restricted by fractional overlap
bedtools_map("-a ivls2.bed -b values5.bed -c 4 -o collapse -f 0.7")

lawremi/HelloRanges documentation built on Oct. 29, 2023, 4:08 p.m.