reduceByFile | R Documentation |
Computations are distributed in parallel by file. Data subsets are extracted and manipulated (MAP) and optionally combined (REDUCE) within a single file.
## S4 method for signature 'GRanges,ANY'
reduceByFile(ranges, files, MAP,
REDUCE, ..., summarize=FALSE, iterate=TRUE, init)
## S4 method for signature 'GRangesList,ANY'
reduceByFile(ranges, files, MAP,
REDUCE, ..., summarize=FALSE, iterate=TRUE, init)
## S4 method for signature 'GenomicFiles,missing'
reduceByFile(ranges, files, MAP,
REDUCE, ..., summarize=FALSE, iterate=TRUE, init)
reduceFiles(ranges, files, MAP, REDUCE, ..., init)
ranges |
A A When |
files |
A |
MAP |
A function executed on each worker. The signature must contain a minimum of two arguments representing the ranges and files. There is no restriction on argument names and additional arguments can be provided.
|
REDUCE |
An optional function that combines output from the
Reduction combines data from a single worker and is always
performed as part of the distributed step. When When |
iterate |
A logical indicating if the Collapsing results iteratively is useful when the number of
records to be processed is large (maybe complete files) but
the end result is a much reduced representation of all records.
Iteratively applying |
summarize |
A logical indicating if results should be returned as a
When |
init |
An optional initial value for |
... |
Arguments passed to other methods. |
reduceByFile
extracts, manipulates and combines multiple ranges
within a single file. Each file is sent to a worker where MAP
is
invoked on each file / range combination. This approach allows multiple
ranges extracted from a single file to be kept separate or combined with
REDUCE
.
In contrast, reduceFiles
treats the output of all MAP calls
as a group and reduces them together. REDUCE
usually plays
a minor role by concatenating or unlisting results.
Both MAP
and REDUCE
are applied in the distributed
step (“on the worker“). Results are not combined across workers in
the distributed step.
reduceByFile:
When summarize=FALSE
the return value is a list
or
the value from the final invocation of REDUCE
. When
summarize=TRUE
output is a SummarizedExperiment
.
When ranges
is a GenomicFiles
object data from
rowRanges
, colData
and metadata
are transferred
to the SummarizedExperiment
.
reduceFiles:
A list
or the value returned by the final invocation of
REDUCE
.
Martin Morgan and Valerie Obenchain
reduceRanges
reduceByRange
GenomicFiles-class
if (requireNamespace("RNAseqData.HNRNPC.bam.chr14", quietly=TRUE)) {
## -----------------------------------------------------------------------
## Count junction reads in BAM files
## -----------------------------------------------------------------------
fls <- ## 8 bam files
RNAseqData.HNRNPC.bam.chr14::RNAseqData.HNRNPC.bam.chr14_BAMFILES
## Ranges of interest.
gr <- GRanges("chr14", IRanges(c(19100000, 106000000), width=1e7))
## MAP outputs a table of junction counts per range.
MAP <- function(range, file, ...) {
## for readGAlignments(), Rsamtools::ScanBamParam()
requireNamespace("GenomicAlignments", quietly=TRUE)
param = Rsamtools::ScanBamParam(which=range)
gal = GenomicAlignments::readGAlignments(file, param=param)
table(GenomicAlignments::njunc(gal))
}
## -----------------------------------------------------------------------
## reduceByFile:
## With no REDUCE, counts are computed for each range / file combination.
counts1 <- reduceByFile(gr, fls, MAP)
length(counts1) ## 8 files
elementNROWS(counts1) ## 2 ranges each
## Tables of counts for each range:
counts1[[1]]
## With a REDUCE, results are combined on the fly. This reducer sums the
## number of records in each range with exactly 1 junction.
REDUCE <- function(mapped, ...)
sum(sapply(mapped, "[", "1"))
reduceByFile(gr, fls, MAP, REDUCE)
## -----------------------------------------------------------------------
## reduceFiles:
## All ranges are treated as a single group:
counts2 <- reduceFiles(gr, fls, MAP)
## Counts are for all ranges grouped:
counts2[[1]]
## Contrast the above with that from reduceByFile() where counts
## are for each range separately:
counts1[[1]]
## -----------------------------------------------------------------------
## Methods for the GenomicFiles class:
## Both reduceByFiles() and reduceFiles() can operate on a GenomicFiles
## object.
colData <- DataFrame(method=rep("RNASeq", length(fls)),
format=rep("bam", length(fls)))
gf <- GenomicFiles(files=fls, rowRanges=gr, colData=colData)
gf
## Subset on ranges or files for different experimental runs.
dim(gf)
gf_sub <- gf[2, 3:4]
dim(gf_sub)
## When summarize = TRUE and no REDUCE is given, the output is a
## SummarizedExperiment object.
se <- reduceByFile(gf, MAP=MAP, summarize=TRUE)
se
## Data from the rowRanges, colData and metadata slots in the
## GenomicFiles are transferred to the SummarizedExperiment.
colData(se)
## Results are in the assays slot named 'data'.
assays(se)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.