write10xCounts: Write count data in the 10x format

View source: R/write10xCounts.R

write10xCountsR Documentation

Write count data in the 10x format

Description

Create a directory containing the count matrix and cell/gene annotation from a sparse matrix of UMI counts, in the format produced by the CellRanger software suite.

Usage

write10xCounts(
  path,
  x,
  barcodes = colnames(x),
  gene.id = rownames(x),
  gene.symbol = gene.id,
  gene.type = "Gene Expression",
  overwrite = FALSE,
  type = c("auto", "sparse", "HDF5"),
  genome = "unknown",
  version = c("2", "3"),
  chemistry = "Single Cell 3' v3",
  original.gem.groups = 1L,
  library.ids = "custom"
)

Arguments

path

A string containing the path to the output directory (for type="sparse") or file (for type="HDF5").

x

A sparse numeric matrix of UMI counts.

barcodes

A character vector of cell barcodes, one per column of x.

gene.id

A character vector of gene identifiers, one per row of x.

gene.symbol

A character vector of gene symbols, one per row of x.

gene.type

A character vector of gene types, expanded to one per row of x. Only used when version="3".

overwrite

A logical scalar specifying whether path should be overwritten if it already exists.

type

String specifying the type of 10X format to save x to. This is either a directory containing a sparse matrix with row/column annotation ("sparse") or a HDF5 file containing the same information ("HDF5").

genome

String specifying the genome for storage when type="HDF5". This can be a character vector with one genome per feature if version="3".

version

String specifying the version of the CellRanger format to produce.

chemistry, original.gem.groups, library.ids

Strings containing metadata attributes to be added to the HDF5 file for type="HDF5". Their interpretation is not formally documented and is left to the user's imagination.

Details

This function will try to automatically detect the desired format based on whether path ends with ".h5". If so, it assumes that path specifies a HDF5 file path and sets type="HDF5". Otherwise it will set type="sparse" under the assumption that path specifies a path to a directory.

Note that there were major changes in the output format for CellRanger version 3.0 to account for non-gene features such as antibody or CRISPR tags. Users can switch to this new format using version="3". See the documentation for “latest” for this new format, otherwise see “2.2” or earlier.

The primary purpose of this function is to create files to use for testing read10xCounts. In principle, it is possible to re-use the HDF5 matrices in cellranger reanalyze. We recommend against doing so routinely due to CellRanger's dependence on undocumented metadata attributes that may change without notice.

Value

For type="sparse", a directory is produced at path. If version="2", this will contain the files "matrix.mtx", "barcodes.tsv" and "genes.tsv". If version="3", it will instead contain "matrix.mtx.gz", "barcodes.tsv.gz" and "features.tsv.gz".

For type="HDF5", a HDF5 file is produced at path containing data in column-sparse format. If version="2", data are stored in the HDF5 group named genome. If version="3", data are stored in the group "matrix".

A TRUE value is invisibly returned.

Author(s)

Aaron Lun

References

10X Genomics (2017). Gene-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/output/matrices

10X Genomics (2018). Feature-Barcode Matrices. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices

10X Genomics (2018). HDF5 Gene-Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/2.2/advanced/h5_matrices

10X Genomics (2018). HDF5 Feature Barcode Matrix Format. https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/h5_matrices

See Also

read10xCounts, to read CellRanger matrices into R.

Examples

# Mocking up some count data.
library(Matrix)
my.counts <- matrix(rpois(1000, lambda=5), ncol=10, nrow=100)
my.counts <- as(my.counts, "CsparseMatrix")
cell.ids <- paste0("BARCODE-", seq_len(ncol(my.counts)))

ngenes <- nrow(my.counts)
gene.ids <- paste0("ENSG0000", seq_len(ngenes))
gene.symb <- paste0("GENE", seq_len(ngenes))

# Writing this to file:
tmpdir <- tempfile()
write10xCounts(tmpdir, my.counts, gene.id=gene.ids, 
    gene.symbol=gene.symb, barcodes=cell.ids)
list.files(tmpdir)

# Creating a version 3 HDF5 file:
tmph5 <- tempfile(fileext=".h5")
write10xCounts(tmph5, my.counts, gene.id=gene.ids, 
    gene.symbol=gene.symb, barcodes=cell.ids, version='3')


MarioniLab/DropletUtils documentation built on Oct. 12, 2024, 5:40 p.m.