Creates a SingleCellExperiment from the CellRanger output directories for 10X Genomics data.
1 2 3 4 5 6 7 8 9 10
A character vector containing one or more directory names, each corresponding to a 10X sample. Each directory should contain a matrix file, a gene/feature annotation file, and a barcode annotation file.
Alternatively, each string may contain a path to a HDF5 file in the sparse matrix format generated by 10X.
These can be mixed with directory names when
Alternatively, each string may contain a prefix of names for the three-file system described above, where the rest of the name of each file follows the standard 10X output.
A character vector of length equal to
A logical scalar indicating whether the columns of the output object should be named with the cell barcodes.
String specifying the type of 10X format to read data from.
String specifying the version of the 10X format to read data from.
String specifying the genome if
Logical scalar indicating whether the text files are compressed for
A BiocParallelParam object specifying how loading should be parallelized for multiple
This function has a long and storied past.
It was originally developed as the
read10xResults function in scater, inspired by the
Read10X function from the Seurat package.
It was then migrated to this package in an effort to consolidate some 10X-related functionality across various packages.
type="auto", the format of the input file is automatically detected for each
samples based on whether it ends with
type is set to
"HDF5"; otherwise it is set to
type="sparse", count data are loaded as a dgCMatrix object.
This is a conventional column-sparse compressed matrix format produced by the CellRanger pipeline,
consisting of a (possibly Gzipped) MatrixMarket text file (
with additional tab-delimited files for barcodes (
and gene annotation (
"features.tsv" for version 3 or
"genes.tsv" for version 2).
type="prefix", count data are also loaded as a dgCMatrix object.
This assumes the same three-file structure for each sample as described for
but each sample is defined here by a prefix in the file names rather than by being a separate directory.
For example, if the
samples entry is
the files are expected to be
type="HDF5", count data are assumed to follow the 10X sparse HDF5 format for large data sets.
It is loaded as a TENxMatrix object, which is a stub object that refers back to the file in
Users may need to set
genome if it cannot be automatically determined when
the function will automatically search for both the unzipped and Gzipped versions of the files.
This assumes that the compressed files have an additional
We can restrict to only compressed or uncompressed files by setting
CellRanger 3.0 introduced a major change in the format of the output files for both
version="auto", the version of the format is automatically detected from the supplied paths.
type="sparse", this is based on whether there is a
"features.tsv.gz" file in the directory.
type="HDF5", this is based on whether there is a top-level
"matrix" group with a
"matrix/features" subgroup in the file.
Matrices are combined by column if multiple
samples were specified.
This will throw an error if the gene information is not consistent across
length(sample)==1, each column is named by the cell barcode.
For multiple samples, the index of each sample in
samples is concatenated to the cell barcode to form the column name.
This avoids problems with multiple instances of the same cell barcodes in different samples.
Note that user-level manipulation of sparse matrices requires loading of the Matrix package.
Otherwise, calculation of
colSums, etc. will result in errors.
A SingleCellExperiment object containing count data for each gene (row) and cell (column) across all
Row metadata will contain the fields
The former is the gene identifier (usually Ensembl), while the latter is the gene name.
version="3", it will also contain the
"Type" field specifying the type of feature (e.g., gene or antibody).
Column metadata will contain the fields
The former contains the name of the sample (or if not supplied, the path in
samples) from which each column was obtained.
The latter contains to the cell barcode sequence and GEM group for each cell library.
Rows are named with the gene identifier. Columns are named with the cell barcode in certain settings, see Details.
The assays will contain a single
"counts" matrix, containing UMI counts for each gene in each cell.
Note that the matrix representation will depend on the format of the
samples, see Details.
The metadata contains a
"Samples" field, containing the input
samples character vector.
Davis McCarthy, with modifications from Aaron Lun
Zheng GX, Terry JM, Belgrader P, and others (2017). Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049.
10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/matrices
10X Genomics (2018). Feature-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices
10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/advanced/h5_matrices
10X Genomics (2018). HDF5 Feature Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices
splitAltExps, to split alternative feature sets (e.g., antibody tags) into their own Experiments.
write10xCounts, to create 10X-formatted file(s) from a count matrix.
1 2 3 4 5 6 7 8
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.