View source: R/downsampleReads.R
downsampleReads | R Documentation |
Generate a UMI count matrix after downsampling reads from the molecule information file produced by CellRanger for 10X Genomics data.
downsampleReads(
sample,
prop,
barcode.length = NULL,
bycol = FALSE,
features = NULL,
use.library = NULL
)
sample |
A string containing the path to the molecule information HDF5 file. |
prop |
A numeric scalar or, if |
barcode.length |
An integer scalar specifying the length of the cell barcode, see |
bycol |
A logical scalar indicating whether downsampling should be performed on a column-by-column basis. |
features |
A character vector containing the names of the features on which to perform downsampling. |
use.library |
An integer vector specifying the library indices for which to extract molecules from |
This function downsamples the reads for each molecule by the specified prop
, using the information in sample
.
It then constructs a UMI count matrix based on the molecules with non-zero read counts.
The aim is to eliminate differences in technical noise that can drive clustering by batch, as described in downsampleMatrix
.
Subsampling the reads with downsampleReads
recapitulates the effect of differences in sequencing depth per cell.
This provides an alternative to downsampling with the CellRanger aggr
function or subsampling with the 10X Genomics R kit.
Note that this differs from directly subsampling the UMI count matrix with downsampleMatrix
.
If bycol=FALSE
, downsampling without replacement is performed on all reads from the entire dataset.
The total number of reads for each cell after downsampling may not be exactly equal to prop
times the original value.
Note that this is the more natural approach and is the default, which differs from the default used in downsampleMatrix
.
If bycol=TRUE
, sampling without replacement is performed on the reads for each cell.
The total number of reads for each cell after downsampling is guaranteed to be prop
times the original total (rounded to the nearest integer).
Different proportions can be specified for different cells by setting prop
to a vector,
where each proportion corresponds to a cell/GEM combination in the order returned by get10xMolInfoStats
.
The use.library
argument is intended for studies with multiple feature types, e.g., antibody capture or CRISPR tags.
As the reads for each feature type are generated in a separate sequencing library, it is generally most appropriate to downsample reads for each feature type separately.
This can be achieved by setting use.library
to the name or index of the desired feature set.
The features of interest can also be directly specified with features
.
(This will be intersected with any use.library
choice if both are specified.)
A numeric sparse matrix containing the downsampled UMI counts for each gene (row) and barcode (column).
If features
is set, only the rows with names in features
are returned.
Aaron Lun
downsampleMatrix
, for more general downsampling of the count matrix.
read10xMolInfo
, to read the contents of the molecule information file.
# Mocking up some 10X HDF5-formatted data.
out <- DropletUtils:::simBasicMolInfo(tempfile())
# Downsampling by the reads.
downsampleReads(out, barcode.length=4, prop=0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.