| scapply | R Documentation |
Workhorse function designed to handle large scRNA-Seq gene expression
matrices such as embedded Seurat matrices, and apply a function to columns of
the matrix split as a ragged array by an index factor, similar to tapply(),
by() or aggregate(). Note that here the index is applied to columns as
these represent cells in the single-cell format, rather than rows as in
aggregate(). Very large matrices are handled by slicing rows into blocks to
avoid excess memory requirements.
scapply(
x,
INDEX,
FUN,
combine = NULL,
combine2 = "c",
progress = TRUE,
sliceMem = 16,
cores = 1L,
...
)
x |
matrix, sparse matrix or DelayedMatrix of raw counts with genes in rows and cells in columns. |
INDEX |
a factor whose length matches the number of columns in |
FUN |
Function to be applied to each subblock of the matrix. |
combine |
A function or a name of a function to apply to the list output to bind the final results together, e.g. 'cbind' or 'rbind' to return a matrix, or 'unlist' to return a vector. |
combine2 |
A function or a name of a function to combine results after
slicing. As the function is usually applied to blocks of 30000 genes or so,
the result is usually a vector with an element per gene. Hence 'c' is the
default function for combining vectors into a single longer vector. However
if each gene returns a number of results (e.g. a vector or dataframe), then
|
progress |
Logical, whether to show progress. |
sliceMem |
Max amount of memory in GB to allow for each subsetted count
matrix object. When |
cores |
Integer, number of cores to use for parallelisation using
|
... |
Optional arguments passed to |
The limit on sliceMem is that the number of elements manipulated in each
block must be
kept below the long vector limit of 2^31 (around 2e9). Increasing cores
requires substantial amounts of spare RAM. combine works
in a similar way to .combine in foreach(); it works across the levels in
INDEX. combine2 is nested and works across slices of genes (an inner
loop), so it is only invoked if slicing occurs which is when a matrix has a
larger memory footprint than sliceMem.
By default returns a list, unless combine is invoked in which case
the returned data type will depend on the functions specified by FUN and
combine.
Myles Lewis
scmean() which applies a fixed function logmean() in a similar
manner, and slapply() which applies a function to a big matrix with
slicing but without splitting by an index factor.
# equivalent
m <- matrix(sample(0:100, 1000, replace = TRUE), nrow = 10)
cell_index <- sample(letters[1:5], 100, replace = TRUE)
o <- scmean(m, cell_index)
o2 <- scapply(m, cell_index, function(x) rowMeans(log2(x +1)),
combine = "cbind")
identical(o, o2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.