read10xMolInfo: Read the 10X molecule information file
In MarioniLab/DropletUtils: Utilities for Handling Single-Cell Droplet Data

read10xMolInfo

R Documentation

Read the 10X molecule information file

Description

Extract relevant fields from the molecule information HDF5 file, produced by CellRanger for 10X Genomics data.

Usage

read10xMolInfo(
  sample,
  barcode.length = NULL,
  keep.unmapped = FALSE,
  get.cell = TRUE,
  get.umi = TRUE,
  get.gem = TRUE,
  get.gene = TRUE,
  get.reads = TRUE,
  get.library = TRUE,
  extract.library.info = FALSE,
  version = c("auto", "2", "3")
)

Arguments

`sample`	A string containing the path to the molecule information HDF5 file.
`barcode.length`	An integer scalar specifying the length of the cell barcode. Only relevant when `version="2"`.
`keep.unmapped`	A logical scalar indicating whether unmapped molecules should be reported.
`get.cell`, `get.umi`, `get.gem`, `get.gene`, `get.reads`, `get.library`	Logical scalar indicating whether the corresponding field should be extracted for each molecule.
`extract.library.info`	Logical scalar indicating whether the library information should be extracted. Only relevant when `version="3"`.
`version`	String specifying the version of the 10X molecule information format to read data from.

Details

Molecules that were not assigned to any gene have gene set to length(genes)+1. By default, these are removed when keep.unmapped=FALSE.

CellRanger 3.0 introduced a major change in the format of the molecule information files. When version="auto", the function will attempt to determine the version format of the file. This can also be user-specified by setting version explicitly.

For files produced by version 2.2 of the CellRanger software, the length of the cell barcode is not given. Instead, the barcode length is automatically inferred if barcode.length=NULL and version="2". Currently, version 1 of the 10X chemistry uses 14 nt barcodes, while version 2 uses 16 nt barcodes.

Setting any of the get.* arguments will (generally) avoid extraction of the corresponding field. This can improve efficiency if that field is not necessary for further analysis. Aside from the missing field, the results are guaranteed to be identical, i.e., same order and number of rows.

Value

A named list is returned containing data, a DataFrame where each row corresponds to a single transcript molecule. This contains the following fields:

barcode:: Character, the cell barcode for each molecule.
umi:: Integer, the processed UMI barcode in 2-bit encoding.
gem_group:: Integer, the GEM group.
gene:: Integer, the index of the gene to which the molecule was assigned. This refers to an entry in the genes vector, see below.
reads:: Integer, the number of reads mapped to this molecule.
reads:: Integer, the number of reads mapped to this molecule.
library:: Integer, the library index in cases where multiple libraries are present in the same file. Only reported when version="3".

A field will not be present in the DataFrame if the corresponding get.* argument is FALSE,

The second element of the list is genes, a character vector containing the names of all genes in the annotation. This is indexed by the gene field in the data DataFrame.

If version="3", a feature.type entry is added to the list. This is a character vector of the same length as genes, containing the feature type for each gene.

If extract.library.info=TRUE, an additional element named library.info is returned. This is a list of lists containing per-library information such as the "library_type". The library field in the data DataFrame indexes this list.

Author(s)

Aaron Lun, based on code by Jonathan Griffiths

References

Zheng GX, Terry JM, Belgrader P, and others (2017). Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049.

10X Genomics (2017). Molecule info. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/molecule_info

10X Genomics (2018). Molecule info. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/molecule_info

Examples

# Mocking up some 10X HDF5-formatted data.
out <- DropletUtils:::simBasicMolInfo(tempfile())

# Reading the resulting file.
read10xMolInfo(out)

MarioniLab/DropletUtils documentation built on July 16, 2025, 1:57 p.m.

MarioniLab/DropletUtils index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MarioniLab/DropletUtils
Utilities for Handling Single-Cell Droplet Data

read10xMolInfo: Read the 10X molecule information file
In MarioniLab/DropletUtils: Utilities for Handling Single-Cell Droplet Data

Read the 10X molecule information file

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to read10xMolInfo in MarioniLab/DropletUtils...

R Package Documentation

Browse R Packages

We want your feedback!

MarioniLab/DropletUtils Utilities for Handling Single-Cell Droplet Data

read10xMolInfo: Read the 10X molecule information file In MarioniLab/DropletUtils: Utilities for Handling Single-Cell Droplet Data

Read the 10X molecule information file

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to read10xMolInfo in MarioniLab/DropletUtils...

R Package Documentation

Browse R Packages

We want your feedback!

MarioniLab/DropletUtils
Utilities for Handling Single-Cell Droplet Data

read10xMolInfo: Read the 10X molecule information file
In MarioniLab/DropletUtils: Utilities for Handling Single-Cell Droplet Data