Row and column extremes (sparse matrices)

Description

Compute maxima and minima for all rows or columns of sparse matrices. Optionally also return which elements are the maxima/minima per row/column.

Usage

1
2
3
4
5
rowMax(X, which = FALSE, ignore.zero = TRUE)
colMax(X, which = FALSE, ignore.zero = TRUE)

rowMin(X, which = FALSE, ignore.zero = TRUE)
colMin(X, which = FALSE, ignore.zero = TRUE)

Arguments

X

a sparse matrix in a format of the Matrix package, typically dgCMatrix . The maxima or minima will be calculated for each row or column of this matrix.

which

optionally return a sparse matrix of the same dimensions as X marking the positions of the columns- or row-wise maxima or minima.

ignore.zero

By default, only the non-zero elements are included in the computations. However, when ignore.zero = F then zeros are also considered. This basically means that for all maxima below zero, the maximum will be set to zero. Likewise, for all minima above zero, the minimum will be set to zero.

Details

The basic workhorse of these functions is the function rollup from the package slam.

Value

By default, these functions returns a sparseVector with the non-zero maxima or minima. Use additionally as.vector to turn this into a regular vector.

When which = T, the result is a list of two items:

max/min

the same sparse vector as described above.

which

a sparse pattern matrix of the kind ngCMatrix indicating the position of the extrema. Note that an extreme might occur more than once per row/column. In that case multiple entries in the row/column are indicated.

Author(s)

Michael Cysouw

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# rowMax(X, ignore.zero = FALSE) is the same as apply(X, 1, max)
# however, with large sparse matrices, the 'apply' approach will start eating away at memory
# and things become slower.
X <- rSparseMatrix(1e3, 1e3, 1e2)
system.time(m1 <- rowMax(X, ignore.zero = FALSE))
system.time(m2 <- apply(X, 1, max)) # slower
all.equal(as.vector(m1), m2) # but same result

# to see the effect even stronger, try something larger
# depending on the amount of available memory, the 'apply' approach will give an error
# "problem too large"
## Not run: 
X <- rSparseMatrix(1e6, 1e6, 1e6)
system.time(m1 <- rowMax(X, ignore.zero = FALSE))
system.time(m2 <- apply(X, 1, max))

## End(Not run)

# speed depends most strongly on the number of entries in the matrix
# also some performance loss with size of matrix
# up to 1e5 entries is still reasonably fast

X <- rSparseMatrix(1e7, 1e7, 1e5)
system.time(m <- rowMax(X))

## Not run: 
X <- rSparseMatrix(1e7, 1e7, 1e7)
system.time(M <- rowMax(X)) # about ten times as slow

## End(Not run)

# apply is not feasably on such large matrices
# Error: problem too large...
## Not run: 
m <- apply(X, 1, max) 

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.