readSparseCounts: Read sparse count matrix from file

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/readSparseCounts.R

Description

Reads a sparse count matrix from file containing a dense tabular format.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
readSparseCounts(
  file,
  sep = "\t",
  quote = NULL,
  comment.char = "",
  row.names = TRUE,
  col.names = TRUE,
  ignore.row = 0L,
  skip.row = 0L,
  ignore.col = 0L,
  skip.col = 0L,
  chunk = 1000L
)

Arguments

file

A string containing a file path to a count table, or a connection object opened in read-only text mode.

sep

A string specifying the delimiter between fields in file.

quote

A string specifying the quote character, e.g., in column or row names.

comment.char

A string specifying the comment character after which values are ignored.

row.names

A logical scalar specifying whether row names are present.

col.names

A logical scalar specifying whether column names are present.

ignore.row

An integer scalar specifying the number of rows to ignore at the start of the file, before the column names.

skip.row

An integer scalar specifying the number of rows to ignore at the start of the file, after the column names.

ignore.col

An integer scalar specifying the number of columns to ignore at the start of the file, before the column names.

skip.col

An integer scalar specifying the number of columns to ignore at the start of the file, after the column names.

chunk

A integer scalar indicating the chunk size to use, i.e., number of rows to read at any one time.

Details

This function provides a convenient method for reading dense arrays from flat files into a sparse matrix in memory. Memory usage can be further improved by setting chunk to a smaller positive value.

The ignore.* and skip.* parameters allow irrelevant rows or columns to be skipped. Note that the distinction between the two parameters is only relevant when row.names=FALSE (for skipping/ignoring columns) or col.names=FALSE (for rows).

Value

A dgCMatrix containing double-precision values (usually counts) for each row (gene) and column (cell).

Author(s)

Aaron Lun

See Also

read.table, readMM

Examples

1
2
3
4
5
outfile <- tempfile()
write.table(data.frame(A=1:5, B=0, C=0:4, row.names=letters[1:5]), 
    file=outfile, col.names=NA, sep="\t", quote=FALSE)

readSparseCounts(outfile)

scuttle documentation built on Dec. 19, 2020, 2 a.m.