bedtools_groupby | R Documentation |
Query sequence from a FASTA file given a set of ranges, including compound regions like transcripts and junction reads. This assumes the sequence is DNA.
bedtools_groupby(cmd = "--help")
R_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",")
do_bedtools_groupby(i, g = 1:3, c, o = "sum", delim=",")
cmd |
String of bedtools command line arguments, as they would be entered at the shell. There are a few incompatibilities between the docopt parser and the bedtools style. See argument parsing. |
i |
Path to a BAM/BED/GFF/VCF/etc file, a BED stream, a file object, or
a ranged data structure, such as a GRanges. Use |
g |
Column index(es) for grouping the input. Columns may be comma-separated. By default, the grouping is by range. |
c |
Specify columns (by integer index) from the input file to operate
upon (see |
o |
Specify the operations (by name) that should be applied to the
columns indicated in |
delim |
Delimiter character used to collapse strings. |
As with all commands, there are three interfaces to the
groupby
command:
bedtools_groupby
Parses the bedtools command line and compiles it to the equivalent R code.
R_bedtools_groupby
Accepts R arguments corresponding to the command line arguments and compiles the equivalent R code.
do_bedtools_groupby
Evaluates the result of
R_bedtools_groupby
. Recommended only for
demonstration and testing. It is best to integrate the compiled
code into an R script, after studying it.
The workhorse for aggregation in R is
aggregate
and we have extended its
interface to make it more convenient. See
aggregate
for details.
The following operations are supported (with R translation):
sum(X)
min(X)
max(X)
min(abs(X))
max(abs(X))
mean(X)
median(X)
distmode(X)
distmode(X, anti=TRUE)
unstrsplit(X, delim)
unstrsplit(unique(X), delim)
lengths(X)
lengths(unique(X))
sd(X)
freq
table(X)
first
drop(heads(X, 1L))
last
drop(tails(X, 1L))
For the sake of simplicity, and because the use cases are not clear, we do not support aggregation of every column. Here are some of the restrictions:
No support for the last column of GFF (the ragged list of attributes).
No support for the INFO, FORMAT and GENO fields of VCF.
No support for the FLAG field of BAM (bedtools
does
not support this either).
A language object containing the compiled R code, generally evaluating to a DataFrame, with a column for each grouping variable and each summarized variable. As a special case, if there are no grouping variables specified, then the grouping is by range, and an aggregated GRanges is returned.
We admit that using column subscripts for c
makes code hard
to read. All the more reason to just write R code.
Michael Lawrence
http://bedtools.readthedocs.io/en/latest/content/tools/groupby.html
aggregate-methods
for general aggregation.
## Not run:
setwd(system.file("unitTests", "data", "groupby", package="HelloRanges"))
## End(Not run)
## aggregation by range
bedtools_groupby("-i values3.header.bed -c 5")
## average variant qualities by chromosome and reference base
## Not run:
indexTabix(bgzip("a_vcfSVtest.vcf", overwrite=TRUE), "vcf")
## End(Not run)
bedtools_groupby("-i a_vcfSVtest.vcf.bgz -g 1,4 -c 6 -o mean")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.