View source: R/sparse_matrix.R
make_sparse_matrix | R Documentation |
kallisto bus
into Gene by Gell MatrixThis function takes the output file of kallisto bus
, after being
sorted and converted into text with bustools
. See vignettes on the
website of this package for a
tutorial. The bustools
output has 4 columns: barcode, UMI, equivalence
class, and counts. This function converts that file into a sparse matrix that
can be used in downstream analyses.
make_sparse_matrix(
bus_path,
tr2g,
est_ncells,
est_ngenes,
whitelist = NULL,
gene_count = TRUE,
TCC = TRUE,
single_gene = TRUE,
verbose = TRUE,
progress_unit = 5e+06
)
bus_path |
Path to the sorted text |
tr2g |
A Data frame with columns |
est_ncells |
Estimated number of cells; providing this argument will speed up computation as it minimizes memory reallocation as vectors grow. |
est_ngenes |
Estimated number of genes or equivalence classes. |
whitelist |
A character vector with valid cell barcodes. This is an
optional argument, that defaults to |
gene_count |
Logical, whether the gene count matrix should be returned. |
TCC |
Logical, whether the TCC matrix should be returned. |
single_gene |
Logical, whether to use single gene mode. In single gene mode, only UMIs that can be uniquely mapped to one gene are kept. Without single gene mode, UMIs mapped to multiple genes will be evenly distributed to those genes. |
verbose |
Whether to display progress. |
progress_unit |
How many iteration to print one progress update when
reading in the |
This function can generate both the gene count matrix and the transcript compatibility count (TCC) matrix. The TCC matrix has barcodes in the columns and equivalence classes in the rows. See Ntranos et al. 2016 for more information about the RCC matrix.
For 10x data sets, you can find a barcode whitelist file that comes with
CellRanger installation. You don't need to run CellRanger to get that. An
example path to get the whitelist file is
cellranger-2.1.0/cellranger-cs/2.1.0/lib/python/cellranger/barcodes/737K-august-2016.txt
for v2 chemistry.
If both gene count and TCC matrices are returned, then this function
returns a list with two matrices, each with genes/equivalence classes in the
rows and barcodes in the columns. If only one of gene count and TCC matrices
is returned, then a dgCMatrix
with genes/equivalence classes in the
rows and barcodes in the columns. These matrices are unfiltered. Please filter
the empty droplets before downstream analysis.
EC2gene
# Load toy example for testing
toy_path <- system.file("testdata", package = "BUSpaRse")
load(paste(toy_path, "toy_example.RData", sep = "/"))
out_fn <- paste0(toy_path, "/output.sorted.txt")
# With whitelist
m <- make_sparse_matrix(out_fn, tr2g_toy, 10, 3, whitelist = whitelist,
gene_count = TRUE, TCC = FALSE, single_gene = TRUE,
verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.