read10xCounts: Load data from a 10X Genomics experiment

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/read10xCounts.R

Description

Creates a SingleCellExperiment from the CellRanger output directories for 10X Genomics data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
read10xCounts(
  samples,
  sample.names = names(samples),
  col.names = FALSE,
  type = c("auto", "sparse", "HDF5", "prefix"),
  version = c("auto", "2", "3"),
  genome = NULL,
  compressed = NULL,
  BPPARAM = SerialParam()
)

Arguments

samples

A character vector containing one or more directory names, each corresponding to a 10X sample. Each directory should contain a matrix file, a gene/feature annotation file, and a barcode annotation file.

Alternatively, each string may contain a path to a HDF5 file in the sparse matrix format generated by 10X. These can be mixed with directory names when type="auto".

Alternatively, each string may contain a prefix of names for the three-file system described above, where the rest of the name of each file follows the standard 10X output.

sample.names

A character vector of length equal to samples, containing the sample names to store in the column metadata of the output object. If NULL, the file paths in samples are used directly.

col.names

A logical scalar indicating whether the columns of the output object should be named with the cell barcodes.

type

String specifying the type of 10X format to read data from.

version

String specifying the version of the 10X format to read data from.

genome

String specifying the genome if type="HDF5" and version='2'.

compressed

Logical scalar indicating whether the text files are compressed for type="sparse" or "prefix".

BPPARAM

A BiocParallelParam object specifying how loading should be parallelized for multiple samples.

Details

This function has a long and storied past. It was originally developed as the read10xResults function in scater, inspired by the Read10X function from the Seurat package. It was then migrated to this package in an effort to consolidate some 10X-related functionality across various packages.

If type="auto", the format of the input file is automatically detected for each samples based on whether it ends with ".h5". If so, type is set to "HDF5"; otherwise it is set to "sparse".

When type="sparse" or "prefix" and compressed=NULL, the function will automatically search for both the unzipped and Gzipped versions of the files. This assumes that the compressed files have an additional ".gz" suffix. We can restrict to only compressed or uncompressed files by setting compressed=TRUE or FALSE, respectively.

CellRanger 3.0 introduced a major change in the format of the output files for both types. If version="auto", the version of the format is automatically detected from the supplied paths. For type="sparse", this is based on whether there is a "features.tsv.gz" file in the directory. For type="HDF5", this is based on whether there is a top-level "matrix" group with a "matrix/features" subgroup in the file.

Matrices are combined by column if multiple samples were specified. This will throw an error if the gene information is not consistent across samples.

If col.names=TRUE and length(sample)==1, each column is named by the cell barcode. For multiple samples, the index of each sample in samples is concatenated to the cell barcode to form the column name. This avoids problems with multiple instances of the same cell barcodes in different samples.

Note that user-level manipulation of sparse matrices requires loading of the Matrix package. Otherwise, calculation of rowSums, colSums, etc. will result in errors.

Value

A SingleCellExperiment object containing count data for each gene (row) and cell (column) across all samples.

Author(s)

Davis McCarthy, with modifications from Aaron Lun

References

Zheng GX, Terry JM, Belgrader P, and others (2017). Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049.

10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/matrices

10X Genomics (2018). Feature-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices

10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/advanced/h5_matrices

10X Genomics (2018). HDF5 Feature Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices

See Also

splitAltExps, to split alternative feature sets (e.g., antibody tags) into their own Experiments.

write10xCounts, to create 10X-formatted file(s) from a count matrix.

Examples

1
2
3
4
5
6
7
8
# Mocking up some 10X genomics output.
example(write10xCounts)

# Reading it in.
sce10x <- read10xCounts(tmpdir)

# Column names are dropped with multiple 'samples'.
sce10x2 <- read10xCounts(c(tmpdir, tmpdir))

DropletUtils documentation built on Feb. 4, 2021, 2:01 a.m.