Generate a UMI count matrix after downsampling reads from the molecule information file produced by CellRanger for 10X Genomics data.
barcode.length = NULL,
bycol = FALSE,
features = NULL,
use.library = NULL
A string containing the path to the molecule information HDF5 file.
A numeric scalar or, if
An integer scalar specifying the length of the cell barcode, see
A logical scalar indicating whether downsampling should be performed on a column-by-column basis.
A character vector containing the names of the features on which to perform downsampling.
An integer vector specifying the library indices for which to extract molecules from
This function downsamples the reads for each molecule by the specified
prop, using the information in
It then constructs a UMI count matrix based on the molecules with non-zero read counts.
The aim is to eliminate differences in technical noise that can drive clustering by batch, as described in
Subsampling the reads with
downsampleReads recapitulates the effect of differences in sequencing depth per cell.
This provides an alternative to downsampling with the CellRanger
aggr function or subsampling with the 10X Genomics R kit.
Note that this differs from directly subsampling the UMI count matrix with
bycol=FALSE, downsampling without replacement is performed on all reads from the entire dataset.
The total number of reads for each cell after downsampling may not be exactly equal to
prop times the original value.
Note that this is the more natural approach and is the default, which differs from the default used in
bycol=TRUE, sampling without replacement is performed on the reads for each cell.
The total number of reads for each cell after downsampling is guaranteed to be
prop times the original total (rounded to the nearest integer).
Different proportions can be specified for different cells by setting
prop to a vector,
where each proportion corresponds to a cell/GEM combination in the order returned by
use.library argument is intended for studies with multiple feature types, e.g., antibody capture or CRISPR tags.
As the reads for each feature type are generated in a separate sequencing library, it is generally most appropriate to downsample reads for each feature type separately.
This can be achieved by setting
use.library to the name or index of the desired feature set.
The features of interest can also be directly specified with
(This will be intersected with any
use.library choice if both are specified.)
A numeric sparse matrix containing the downsampled UMI counts for each gene (row) and barcode (column).
features is set, only the rows with names in
features are returned.
downsampleMatrix, for more general downsampling of the count matrix.
read10xMolInfo, to read the contents of the molecule information file.
# Mocking up some 10X HDF5-formatted data.
out <- DropletUtils:::simBasicMolInfo(tempfile())
# Downsampling by the reads.
downsampleReads(out, barcode.length=4, prop=0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.