randomSparseArray: Random SparseArray object

View source: R/randomSparseArray.R

randomSparseArrayR Documentation

Random SparseArray object

Description

randomSparseArray() and poissonSparseArray() can be used to generate a random SparseArray object efficiently.

Usage

randomSparseArray(dim, density=0.05, dimnames=NULL)
poissonSparseArray(dim, lambda=-log(0.95), density=NA, dimnames=NULL)

## Convenience wrappers for the 2D case:
randomSparseMatrix(nrow, ncol, density=0.05, dimnames=NULL)
poissonSparseMatrix(nrow, ncol, lambda=-log(0.95), density=NA,
                    dimnames=NULL)

Arguments

dim

The dimensions (specified as an integer vector) of the SparseArray object to generate.

density

The desired density (specified as a number >= 0 and <= 1) of the SparseArray object to generate, that is, the ratio between its number of nonzero elements and its total number of elements. This is nzcount(x)/length(x) or 1 - sparsity(x).

Note that for poissonSparseArray() and poissonSparseMatrix() density must be < 1 and the actual density of the returned object won't be exactly as requested but will typically be very close.

dimnames

The dimnames to put on the object to generate. Must be NULL or a list of length the number of dimensions. Each list element must be either NULL or a character vector along the corresponding dimension.

lambda

The mean of the Poisson distribution. Passed internally to the calls to rpois().

Only one of lambda and density can be specified.

When density is requested, rpois() is called internally with lambda set to -log(1 - density). This is expected to generate Poisson data with the requested density.

Finally note that the default value for lambda corresponds to a requested density of 0.05.

nrow, ncol

Number of rows and columns of the SparseMatrix object to generate.

Details

randomSparseArray() mimics the rsparsematrix() function from the Matrix package but returns a SparseArray object instead of a dgCMatrix object.

poissonSparseArray() populates a SparseArray object with Poisson data i.e. it's equivalent to:

    a <- array(rpois(prod(dim), lambda), dim)
    as(a, "SparseArray")

but is faster and more memory efficient because intermediate dense array a is never generated.

Value

A SparseArray derivative (of class SVT_SparseArray or SVT_SparseMatrix) with the requested dimensions and density.

The type of the returned object is "double" for randomSparseArray() and randomSparseMatrix(), and "integer" for poissonSparseArray() and poissonSparseMatrix().

Note

Unlike with Matrix::rsparsematrix() there's no limit on the number of nonzero elements that can be contained in the returned SparseArray object.

For example Matrix::rsparsematrix(3e5, 2e4, density=0.5) will fail with an error but randomSparseMatrix(3e5, 2e4, density=0.5) should work (even though it will take some time and the memory footprint of the resulting object will be about 18 Gb).

See Also

  • The Matrix::rsparsematrix function in the Matrix package.

  • The stats::rpois function in the stats package.

  • SVT_SparseArray objects.

Examples

## ---------------------------------------------------------------------
## randomSparseArray() / randomSparseMatrix()
## ---------------------------------------------------------------------
set.seed(123)
dgcm1 <- rsparsematrix(2500, 950, density=0.1)
set.seed(123)
svt1 <- randomSparseMatrix(2500, 950, density=0.1)
svt1
type(svt1)  # "double"

stopifnot(identical(as(svt1, "dgCMatrix"), dgcm1))

## ---------------------------------------------------------------------
## poissonSparseArray() / poissonSparseMatrix()
## ---------------------------------------------------------------------
svt2 <- poissonSparseMatrix(2500, 950, density=0.1)
svt2
type(svt2)  # "integer"
1 - sparsity(svt2)  # very close to the requested density

set.seed(123)
svt3 <- poissonSparseArray(c(600, 1700, 80), lambda=0.01)
set.seed(123)
a3 <- array(rpois(length(svt3), lambda=0.01), dim(svt3))
stopifnot(identical(svt3, SparseArray(a3)))

## The memory footprint of 'svt3' is 10x smaller than that of 'a3':
object.size(svt3)
object.size(a3)
as.double(object.size(a3) / object.size(svt3))

Bioconductor/SparseArray documentation built on July 17, 2024, 6:06 a.m.