blockGrid: Define grids to use in the context of block processing

Description Usage Arguments Details Value See Also Examples

View source: R/blockGrid.R

Description

blockGrid() is the primary utility function to use to define a grid that is suitable for block processing of an array-like object.

rowGrid() and colGrid() are additional functions, specific to the 2-dimensional case. They can be used to define blocks of full rows or full columns.

A family of utilities is provided to control the automatic block size (or length) and shape.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
## Define grids to use in the context of block processing:

blockGrid(x, block.length=NULL, chunk.grid=NULL, block.shape=NULL)

rowGrid(x, nrow=NULL, block.length=NULL)
colGrid(x, ncol=NULL, block.length=NULL)

## Control the automatic block size (or length) and shape:

getAutoBlockSize()
setAutoBlockSize(size=1e8)

getAutoBlockLength(type)

getAutoBlockShape()
setAutoBlockShape(shape=c("hypercube",
                          "scale",
                          "first-dim-grows-first",
                          "last-dim-grows-first"))

Arguments

x

An array-like or matrix-like object for blockGrid.

A matrix-like object for rowGrid and colGrid.

block.length

The length of the blocks i.e. the number of array elements per block. By default the automatic block length (returned by getAutoBlockLength(type(x))) is used. Depending on how much memory is available on your machine, you might want to increase (or decrease) the automatic block length by adjusting the automatic block size with setAutoBlockSize().

chunk.grid

The grid of physical chunks. By default chunkGrid(x) is used.

block.shape

A string specifying the shape of the blocks. See makeCappedVolumeBox for a description of the supported shapes. By default getAutoBlockShape() is used.

nrow

The number of rows of the blocks. The bottommost blocks might have less. See examples below.

ncol

The number of columns of the blocks. The rightmost blocks might have less. See examples below.

size

The automatic block size in bytes. Note that, except when the type of the array data is "character" or "list", the size of a block is its length multiplied by the size of an array element. For example, a block of 500x1000x500 doubles has a length of 250 million elements and a size of 2 Gb (each double occupies 8 bytes of memory).

The automatic block size is set to 100 Mb at package startup and can be reset anytime to this value by calling setAutoBlockSize() with no argument.

type

A string specifying the type of the array data.

shape

A string specifying the automatic block shape. See makeCappedVolumeBox for a description of the supported shapes.

The automatic block shape is set to "hypercube" at package startup and can be reset anytime to this value by calling setAutoBlockShape() with no argument.

Details

By default, primary block processing functions blockApply() and blockReduce() use the grid returned by blockGrid(x) to process array-like object x block by block. This can be changed with setAutoGridMaker(). See ?setAutoGridMaker for more information.

Value

blockGrid: An ArrayGrid object on reference array x. The grid elements define the blocks that will be used to process x by block. The grid is optimal in the sense that:

  1. It's compatible with the grid of physical chunks a.k.a. chunk grid. This means that, when the chunk grid is known (i.e. when chunkGrid(x) is not NULL or chunk.grid is supplied), every block in the grid contains one or more full chunks. In other words, chunks never cross block boundaries.

  2. Its resolution is such that the blocks have a length that is as close as possibe to (but does not exceed) block.length. An exception is made when some chunks already have a length that is >= block.length, in which case the returned grid is the same as the chunk grid.

Note that the returned grid is regular (i.e. is a RegularArrayGrid object) unless the chunk grid is not regular (i.e. is an ArbitraryArrayGrid object).

rowGrid: A RegularArrayGrid object on reference array x where the grid elements define blocks made of full rows of x.

colGrid: A RegularArrayGrid object on reference array x where the grid elements define blocks made of full columns of x.

getAutoBlockSize: The current automatic block size in bytes as a single numeric value.

setAutoBlockSize: The new automatic block size in bytes as an invisible single numeric value.

getAutoBlockLength: The automatic block length as a single integer value.

getAutoBlockShape: The current automatic block shape as a single string.

setAutoBlockShape: The new automatic block shape as an invisible single string.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
## ---------------------------------------------------------------------
## A VERSION OF sum() THAT USES BLOCK PROCESSING
## ---------------------------------------------------------------------

block_sum <- function(a, grid)
{
    sums <- lapply(grid, function(viewport) sum(read_block(a, viewport)))
    sum(unlist(sums))
}

## On an ordinary matrix:
m <- matrix(runif(600), ncol=12)
m_grid <- blockGrid(m, block.length=120)
sum1 <- block_sum(m, m_grid)
sum1

## On a DelayedArray object:
library(HDF5Array)
M <- as(m, "HDF5Array")
sum2 <- block_sum(M, m_grid)
sum2

sum3 <- block_sum(M, colGrid(M, block.length=120))
sum3

sum4 <- block_sum(M, rowGrid(M, block.length=80))
sum4

## Sanity checks:
sum0 <- sum(m)
stopifnot(identical(sum1, sum0))
stopifnot(identical(sum2, sum0))
stopifnot(identical(sum3, sum0))
stopifnot(identical(sum4, sum0))

## ---------------------------------------------------------------------
## blockGrid()
## ---------------------------------------------------------------------
grid <- blockGrid(m, block.length=120)
grid
as.list(grid)  # turn the grid into a list of ArrayViewport objects
table(lengths(grid))
stopifnot(maxlength(grid) <= 120)

grid <- blockGrid(m, block.length=120,
                     block.shape="first-dim-grows-first")
grid
table(lengths(grid))
stopifnot(maxlength(grid) <= 120)

grid <- blockGrid(m, block.length=120,
                     block.shape="last-dim-grows-first")
grid
table(lengths(grid))
stopifnot(maxlength(grid) <= 120)

blockGrid(m, block.length=100)
blockGrid(m, block.length=75)
blockGrid(m, block.length=25)
blockGrid(m, block.length=20)
blockGrid(m, block.length=10)

## ---------------------------------------------------------------------
## rowGrid() AND colGrid()
## ---------------------------------------------------------------------
rowGrid(m, nrow=10)  # 5 blocks of 10 rows each
rowGrid(m, nrow=15)  # 3 blocks of 15 rows each plus 1 block of 5 rows
colGrid(m, ncol=5)   # 2 blocks of 5 cols each plus 1 block of 2 cols

## See ?RealizationSink for an advanced example of user-implemented
## block processing using colGrid() and a realization sink.

## ---------------------------------------------------------------------
## CONTROL THE DEFAULT BLOCK SIZE (OR LENGTH) AND SHAPE
## ---------------------------------------------------------------------
getAutoBlockSize()

getAutoBlockLength("double")
getAutoBlockLength("integer")
getAutoBlockLength("logical")
getAutoBlockLength("raw")

setAutoBlockSize(140)
getAutoBlockLength(type(m))
blockGrid(m)
lengths(blockGrid(m))
dims(blockGrid(m))

getAutoBlockShape()
setAutoBlockShape("scale")
blockGrid(m)
lengths(blockGrid(m))
dims(blockGrid(m))

## Reset automatic block size and shape to factory settings:
setAutoBlockSize()
setAutoBlockShape()

DelayedArray documentation built on Nov. 1, 2018, 2:27 a.m.