blockApply: blockApply() and family

View source: R/blockApply.R

blockApplyR Documentation

blockApply() and family

Description

A family of convenience functions to walk on the blocks of an array-like object and process them.

Usage

## Main looping functions:

blockApply(x, FUN, ..., grid=NULL, as.sparse=FALSE,
           BPPARAM=getAutoBPPARAM(), verbose=NA)

blockReduce(FUN, x, init, ..., BREAKIF=NULL, grid=NULL, as.sparse=FALSE,
            verbose=NA)

## Lower-level looping functions:
gridApply(grid, FUN, ..., BPPARAM=getAutoBPPARAM(), verbose=NA)
gridReduce(FUN, grid, init, ..., BREAKIF=NULL, verbose=NA)

## Retrieve grid context for the current block/viewport:
effectiveGrid(envir=parent.frame(2))
currentBlockId(envir=parent.frame(2))
currentViewport(envir=parent.frame(2))

## Get/set automatic parallel back-end:
getAutoBPPARAM()
setAutoBPPARAM(BPPARAM=NULL)

## For testing/debugging callback functions:
set_grid_context(effective_grid, current_block_id, current_viewport=NULL,
                 envir=parent.frame(1))

Arguments

x

An array-like object, typically a DelayedArray object or derivative.

FUN

For blockApply and blockReduce, FUN is the callback function to apply to each block of data in x. More precisely, FUN will be called on each block of data in x defined by the grid used to walk on x.

IMPORTANT: If as.sparse is set to FALSE, all blocks will be passed to FUN as ordinary arrays. If it's set to TRUE, they will be passed as SparseArraySeed objects. If it's set to NA, then is_sparse(x) determines how they will be passed to FUN.

For gridApply() and gridReduce(), FUN is the callback function to apply to each **viewport** in grid.

Beware that FUN must take at least **two** arguments for blockReduce() and gridReduce(). More precisely:

  • blockReduce() will perform init <- FUN(block, init, ...) on each block, so FUN must take at least arguments block and init.

  • gridReduce() will perform init <- FUN(viewport, init, ...) on each viewport, so FUN must take at least arguments viewport and init.

In both cases, the exact names of the two arguments doesn't really matter. Also FUN is expected to return a value of the same type as its 2nd argument (init).

...

Additional arguments passed to FUN.

grid

The grid used for the walk, that is, an ArrayGrid object that defines the blocks (or viewports) to walk on.

For blockApply() and blockReduce() the supplied grid must be compatible with the geometry of x. If not specified, an automatic grid is used. By default defaultAutoGrid(x) is called to create an automatic grid. The automatic grid maker can be changed with setAutoGridMaker(). See ?setAutoGridMaker for more information.

as.sparse

Passed to the internal calls to read_block. See ?read_block in the S4Arrays package for more information.

BPPARAM

A NULL, in which case blocks are processed sequentially, or a BiocParallelParam instance (from the BiocParallel package), in which case they are processed in parallel. The specific BiocParallelParam instance determines the parallel back-end to use. See ?BiocParallelParam in the BiocParallel package for more information about parallel back-ends.

verbose

Whether block processing progress should be displayed or not. If set to NA (the default), verbosity is controlled by DelayedArray:::get_verbose_block_processing(). Setting verbose to TRUE or FALSE overrides this.

init

The value to pass to the first call to FUN(block, init) (or FUN(viewport, init)) when blockReduce() (or gridReduce()) starts the walk. Note that blockReduce() and gridReduce() always operate sequentially.

BREAKIF

An optional callback function that detects a break condition. Must return TRUE or FALSE. At each iteration blockReduce() (and gridReduce()) will call it on the result of init <- FUN(block, init) (on the result of init <- FUN(viewport, init) for gridReduce()) and exit the walk if BREAKIF(init) returned TRUE.

envir

Do not use (unless you know what you are doing).

effective_grid, current_block_id, current_viewport

See Details below.

Details

effectiveGrid(), currentBlockId(), and currentViewport() return the "grid context" for the block/viewport being currently processed. By "grid context" we mean:

  • The effective grid, that is, the user-supplied grid or defaultAutoGrid(x) if the user didn't supply any grid.

  • The current block id (a.k.a. block rank).

  • The current viewport, that is, the ArrayViewport object describing the position of the current block w.r.t. the effective grid.

Note that effectiveGrid(), currentBlockId(), and currentViewport() can only be called (with no arguments) from **within** the callback functions FUN and/or BREAKIF passed to blockApply() and family.

If you need to be able to test/debug your callback function as a standalone function, set an arbitrary effective grid, current block id, and current_viewport, by calling

    set_grid_context(effective_grid, current_block_id, current_viewport)

**right before** calling the callback function.

Value

For blockApply() and gridApply(), a list with one list element per block/viewport visited.

For blockReduce() and gridReduce(), the result of the last call to FUN.

For effectiveGrid(), the grid (ArrayGrid object) being effectively used.

For currentBlockId(), the id (a.k.a. rank) of the current block.

For currentViewport(), the viewport (ArrayViewport object) of the current block.

See Also

  • defaultAutoGrid and family to create automatic grids to use for block processing of array-like objects.

  • ArrayGrid in the S4Arrays package for the formal representation of grids and viewports.

  • read_block and write_block in the S4Arrays package.

  • MulticoreParam, SnowParam, and bpparam, from the BiocParallel package.

  • DelayedArray objects.

Examples

m <- matrix(1:60, nrow=10)
m_grid <- defaultAutoGrid(m, block.length=16, block.shape="hypercube")

## ---------------------------------------------------------------------
## blockApply()
## ---------------------------------------------------------------------
blockApply(m, identity, grid=m_grid)
blockApply(m, sum, grid=m_grid)

blockApply(m, function(block) {block + currentBlockId()*1e3}, grid=m_grid)
blockApply(m, function(block) currentViewport(), grid=m_grid)
blockApply(m, dim, grid=m_grid)

## The grid does not need to be regularly spaced:
a <- array(runif(8000), dim=c(25, 40, 8))
a_tickmarks <- list(c(7L, 15L, 25L), c(14L, 22L, 40L), c(2L, 8L))
a_grid <- ArbitraryArrayGrid(a_tickmarks)
a_grid
blockApply(a, function(block) sum(log(block + 0.5)), grid=a_grid)

## See block processing in action:
blockApply(m, function(block) sum(log(block + 0.5)), grid=m_grid,
           verbose=TRUE)

## Use parallel evaluation:
library(BiocParallel)
if (.Platform$OS.type != "windows") {
    BPPARAM <- MulticoreParam(workers=4)
} else {
    ## MulticoreParam() is not supported on Windows so we use
    ## SnowParam() on this platform.
    BPPARAM <- SnowParam(4)
}
blockApply(m, function(block) sum(log(block + 0.5)), grid=m_grid,
           BPPARAM=BPPARAM, verbose=TRUE)
## Note that blocks can be visited in any order!

## ---------------------------------------------------------------------
## blockReduce()
## ---------------------------------------------------------------------
FUN <- function(block, init) anyNA(block) || init
blockReduce(FUN, m, init=FALSE, grid=m_grid, verbose=TRUE)

m[10, 1] <- NA
blockReduce(FUN, m, init=FALSE, grid=m_grid, verbose=TRUE)

## With early bailout:
blockReduce(FUN, m, init=FALSE, BREAKIF=identity, grid=m_grid,
            verbose=TRUE)

## Note that this is how the anyNA() method for DelayedArray objects is
## implemented.

Bioconductor/DelayedArray documentation built on March 4, 2024, 9:12 p.m.