bedtools_coverage: bedtools_coverage

View source: R/coverage.R

bedtools_coverageR Documentation

bedtools_coverage

Description

Compute the coverage of one or more datasets over a set of query ranges.

Usage

bedtools_coverage(cmd = "--help")
R_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE,
                    f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                    s = FALSE, S = FALSE, split = FALSE, g = NA_character_,
                    header = FALSE, sortout = FALSE)
do_bedtools_coverage(a, b, hist = FALSE, d = FALSE, counts = FALSE,
                     f = 1e-09, F = 1e-09, r = FALSE, e = FALSE,
                     s = FALSE, S = FALSE, split = FALSE, g = NA_character_,
                     header = FALSE, sortout = FALSE)

Arguments

cmd

String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing.

a

Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or a ranged data structure, such as a GRanges. The coverage is computed over these ranges. Use "stdin" for input from another process (presumably while running via Rscript). For streaming from a subprocess, prefix the command string with “<”, e.g., "<grep foo file.bed". Any streamed data is assumed to be in BED format.

b

Like a, except supports multiple datasets, either as a vector/list or a comma-separated string. Also supports file glob patterns, i.e., strings containing the wildcard, “*”. The coverage is computed by counting how many of these ranges overlap positions in a.

hist

Report a histogram of coverage for each feature in a as well as a summary histogram for all features in a. See below for the structure of the returned table.

d

Report the depth at each position in each a feature. Positions reported are one based. Each position and depth follow the complete a feature.

counts

Only report the count of overlaps, not fraction, etc. Restricted by f and r.

f

Minimum overlap required as a fraction of A [default: any overlap].

F

Minimum overlap required as a fraction of B [default: any overlap].

r

Require that the fraction of overlap be reciprocal for a and b. In other words, if f is 0.90 and r is TRUE, this requires that b overlap at least 90% of a and that a also overlaps at least 90% of b.

e

Require that the minimum fraction be satisfied for a OR b. In other words, if e is TRUE with f=0.90 and F=0.10 this requires that either 90% of a is covered OR 10% of b is covered. If FALSE, both fractions would have to be satisfied.

s

Force strandedness. That is, only count ranges in b that overlap a on the same strand. By default, coverage is computed without respect to strand. Note that this is the exact opposite of Bioconductor behavior.

S

Require opposite strandedness. That is, count the features in b that overlap a on the opposite strand. By default, coverage is computed without respect to strand.

split

Treat split BAM (i.e., having an ‘N’ CIGAR operation) or BED12 entries as compound ranges with gaps, i.e., as GRangesList objects.

g

A genome file, identifier or Seqinfo object that defines the order and size of the sequences.

header

Ignored.

sortout

Sort the result by position.

Details

As with all commands, there are three interfaces to the coverage command:

bedtools_coverage

Parses the bedtools command line and compiles it to the equivalent R code.

R_bedtools_coverage

Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.

do_bedtools_coverage

Evaluates the result of R_bedtools_coverage. Recommended only for demonstration and testing. It is best to integrate the compiled code into an R script, after studying it.

Typically, we compute coverage with coverage, but features like fractional overlap restriction and histograms add (educational) complexity. One key trick is the [,List,GenomicRanges method, which lets us extract coverage vectors for specific regions (see the generated code).

Value

A language object containing the compiled R code, evaluating to a GRanges object with coverage information. The exact type of information depends on the mode:

default

Three metadata columns: “count” (the number of overlapping ranges), “covered” (the number of bases covered in the query), “fraction” (the fraction of bases covered).

d

Metadata column “coverage” is an RleList with position-level coverage (depth). This is what we typically refer to as coverage in Bioconductor.

hist

Metadata column “coverage” is a list of DataFrames. Each DataFrame contains a histogram of the coverage depth over that range, with columns “coverage” (the coverage value), “count” (the number of positions with that coverage), “len” (the length of the region, all the same) and “fraction” (the fraction of positions at that coverage). There is also a “coverage” component on metadata(ans) with the same histogram aggregated over all query ranges.

Author(s)

Michael Lawrence

References

http://bedtools.readthedocs.io/en/latest/content/tools/coverage.html

See Also

coverage-methods for ways to compute coverage.

Examples

## Not run: 
setwd(system.file("unitTests", "data", "coverage", package="HelloRanges"))

## End(Not run)

## default behavior
bedtools_coverage("-a a.bed -b b.bed")
## histogram
bedtools_coverage("-a a.bed -b b.bed -hist -g test.genome")
## per-position depth
bedtools_coverage("-a a.bed -b b.bed -d -g test.genome")

lawremi/HelloRanges documentation built on Oct. 29, 2023, 4:08 p.m.