SparseAssays-class: SparseAssays objects

Description Usage Arguments Details Dimensions Subsetting Combining Densify Coercion Applying a function to a SparseAssays object (SAapply) Author(s) See Also Examples

Description

The SparseAssays virtual class and its methods provide a formal abstraction of the sparseAssays slot of SparseSummarizedExperiment and RangedSparseSummarizedExperiment objects.

SimpleListSparseAssays and SimpleListJointSparseAssays (not yet implemented) are concrete subclasses of SparseAssays with the former being currently the default implementation of SparseAssays objects. Other implementations (e.g. disk-based, environment-based) could easily be added.

Note that these classes are not meant to be used directly by the end-user and the material in this man page is aimed at package developers.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Constructor

SparseAssays(sparse_assays = SimpleList(), subclass)

## Accessors

## S4 method for signature 'SparseAssays'
length(x)

## S4 method for signature 'SparseAssays'
NROW(x)

## S4 method for signature 'SparseAssays'
names(x)

## S4 replacement method for signature 'SparseAssays'
names(x) <- value

## S4 method for signature 'SparseAssays,ANY,ANY'
x[[i, j, ...]]

## S4 replacement method for signature 'SparseAssays,ANY,ANY'
x[[i, j, ...]] <- value

## Densify a SparseAssays object

densify(x, i, j, ..., withRownames = TRUE)

## Apply a function to a SparseAssays object

SAapply(X, FUN, densify = TRUE, sparsify = !densify,
        withRownames = TRUE, ...,
        BPREDO = list(), BPPARAM = bpparam())

Arguments

sparse_assays

A SimpleList or list that can be used to construct a SparseAssays instance; see ‘Examples’.

subclass

The concrete subclass to be instantiated. The default is SimpleListSparseAssays.

x

A SparseAssays object.

i, j

For [[, [[<-, i is a numeric or character vector of length 1 indicating which sparse assay to select. j is not used.

For densify, i and j are numeric or character vectors indicating which sparse assays (i) and samples (j) to extract and densify. At least one of i or j must be provided; see ‘Densify’.

value

An object of a class specified in the S4 method signature or as outlined in ‘Details’.

withRownames

A logical(1), indicating whether rownames should be applied to densified sparse assay elements. Setting withRownames = FALSE increases the speed and memory efficiency with which sparse assays are extracted. Note that colnames are always added.

For SAaaply, withRownames has no effect if densify = FALSE.

X

A SparseAssays object.

FUN

The function to be applied to each element of X: see ‘Applying a function to a SparseAssays object (SAapply)’. In the case of functions like +, %*%, the function name must be backquoted or quoted.

densify

A logical(1), indicating whether the sparse data need to be densified prior to applying FUN.

sparsify

A logical(1), indicating whether the result should be sparsified following the application of FUN. By default, sparsify = !densify, that is, sparse data will remain sparse and densified data will remain densified.

...

Optional arguments to FUN or additional arguments, for use in specific methods.

BPREDO, BPPARAM

See ?bplapply.

Details

SparseAssays objects have a list-like semantics with elements containing key and value elements.

The SparseAssays API consists of:

A SparseAssays concrete subclass needs to implement (b) (required) plus the methods in (d) (required). The methods in (c) are inherited from the SimpleList class. Each element of a SparseAssays object is referred to as a "sparse assay" (lowercase).

IMPORTANT: Methods that return a modified SparseAssays object (a.k.a. endomorphisms), that is, [ as well as replacement methods names<-, [[<-, and [<-, must respect the copy-on-change contract. With objects that don't make use of references internally, the developer doesn't need to take any special action for that because it's automatically taken care of by R itself. However, for objects that do make use of references internally (e.g. environments, external pointers, pointer to a file on disk, etc...), the developer needs to be careful to implement endomorphisms with copy-on-change semantics. This can easily be achieved by performaing a full (deep) copy of the object before modifying it instead of trying to modify it in-place. Note that the full (deep) copy is not always necessary in order to achieve copy-on-change semantics: it's enough (and often preferrable for performance reasons) to copy only the parts of the objects that need to be modified.

SparseAssays has currently 1 implementation formalized by concrete subclass SimpleListSparseAssays. There are written specs for a second formalization, SimpleListJointSparseAssays, although this is not yet implemented.

The sparseAssays slot of a SparseSummarizedExperiment object contains an instance of SimpleListSparseAssays.

NOTE: SparseAssays only payoff compared to SummarizedExperiment::Assays when you get more than one measurement per-feature, per-sample. The payoff is greater when there are lots of features with the same measurement (normally within a sample, although SimpleListJointSparseAssays should allow this constraint to be removed) and/or lots of NAs per-sample.

Dimensions

The dimensions of a SparseAssays object are defined by nrow = length of features (usually the length of the key), and ncol = number of samples.

Subsetting

Subsetting with [ uses i to subset rows/features in each sparse assay and j to subset samples in each sparse assay. NOTE: Use [[ with i to select the i-th sparse assay.

Combining

SparseAssays objects can be combined in three different ways.

  1. rbind Suitable for when each object has the same samples.

  2. cbind Suitable for when each object has unique samples.

  3. combine Suitable in either case, however, requires that dimnames are set on each object and that all objects have an identical number of sparse assays with identical names.

Densify

SparseAssays objects can be densified (expanded) using the densify() method. For each sample, the densified data for a single sparse assay is returned as a matrix. Therefore, the densify generic returns a SimpleList of length = length(i), each containing a SimpleList of length = length{j}, each containing a matrix of the densified data for that sample in that sparse assay.

WARNING: It is generally advisable to not simulatenously densify all sparse assays in all samples since the entire point of using SparseAssays is to use a more memory-efficient storage of the data. Therefore, users must provide at least one of i (to select sparse assays) and j (to select samples). If you really wish to simultaneously densify all sparse assays and samples, then use densify(x, seq_along(x), seq_len(ncol(x))). If i (resp. j) is missing then effectively i = seq_along(x) (resp. j = seq_len(ncol(x))).

Coercion

SparseAssays objects can be coerced into a ShallowSimpleListAssays object (from the SummarizedExperiment package); this will also densify the object. This can be done using as(x, "ShallowSimpleListAssays"), where x is a SparseAssays object. WARNING: The resulting ShallowSimpleListAssays object will typically require much more memory than the equivalent SparseAssays object.

Applying a function to a SparseAssays object (SAapply)

A common use case is to apply a function to a SparseAssays object. For example, we might wish to compute the column-wise mean(s) for each sample in a sparse assay. SAapply is designed to do this in an efficient manner with an interface that is modelled on the lapply functional in base R.

SAapply takes a SparseAssays object (X) and applies a single function (FUN) to each sample in each sparse assay. It is worth emphasising that this means that the same function is applied to all samples and sparse assays in X (use sparseAssay() with the i argument to extract specific sparse assays).

While it is desirable to apply FUN to the data in its sparse form, this is not always possible and the data may need to be densified prior to FUN being applied. The SAapply method simplifies this process in two ways:

  1. SAapply allows the user to pass a function, FUN, that works on sparse or dense data. The densify argument specifies whether the data need to be densified prior to FUN being applied.

  2. If the data need to be densified, then SAaaply does this in a memory-efficient manner. For example, it will serially densify each sample in each sparse assay and apply FUN before moving onto the next sample's data (this is appropriately generalised if the user specifies a non-serial BiocParallelParam backend via the BPPARAM argument).

Parallelisation is implemented via the BiocParallel package. Please consult its documentation for further details on parallelisation options, in particular the ?BiocParallelParam help page.

Finally, the sparsify argument determines the class of the return value of SAapply(). If sparsify = FALSE, the return value is a nested list where the first level is the sparse assay and the second level is the sample-level data as a dense matrix. If sparsify = TRUE, the return value is a SparseAssays object with the same concrete subclass as X. By default, sparsify = !densify, that is, sparse data will remain sparse and densified data will remain densified. The use of densify = TRUE allows the output of SAapply() to be used as the value in a call to sparseAssays(x) <- value; see ‘Examples’.

NOTE: The generic is called SAapply rather than saapply to reduce the confusion/typo-rate with sapply.

Author(s)

Peter Hickey, peter.hickey@gmail.com

See Also

Examples

1
# See ?SimpleListSparseAssays

PeteHaitch/SparseSummarizedExperiment documentation built on May 8, 2019, 1:31 a.m.