suppressPackageStartupMessages(library(TENxPBMCData)) require(scry)

We illustrate the application of scry methods to disk-based data from the TENxPBMCData package. Each dataset in this package is stored in an HDF5 file that is accessed through a DelayedArray interface. This avoids the need to load the entire dataset into memory for analysis.

sce<-TENxPBMCData(dataset="pbmc3k") h5counts<-counts(sce) seed(h5counts) #print information about object h5counts<-h5counts[rowSums(h5counts)>0,] system.time(h5devs<-devianceFeatureSelection(h5counts)) # 26 sec

We now compare the computation speed when the same data is converted to an ordinary array in-memory. Note this would not be possible with larger HDF5Array objects.

denseCounts<-as.matrix(h5counts) system.time(denseDevs<-devianceFeatureSelection(denseCounts)) # 5 sec max(abs(denseDevs-h5devs)) #should be close to zero

Finally we compare the speed when the counts data are stored in a sparse in-memory Matrix format

mean(denseCounts>0) #shows that the data are mostly zeros so sparsity useful sparseCounts<-Matrix::Matrix(denseCounts,sparse=TRUE) system.time(sparseDevs<-devianceFeatureSelection(sparseCounts)) #1.6 sec max(abs(sparseDevs-h5devs)) #should be close to zero

Using disk-based data saves memory but slows computation time. When the data contain mostly zeros, and are not too large, the sparse in-memory Matrix object achieves fastest computation times. The resulting deviance statistics are the same for all of the different data formats.

One can run `nullResiduals`

on `HDF5Matrix`

, `DelayedArray`

matrices, and sparse
matrices from the `Matrix`

package with the same syntax used for the base
matrix case.

We illustrate this with the same dataset from the `TENxPBMCData`

package.

sce <- nullResiduals(sce, assay="counts", type="deviance") str(sce)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.