expr: Functions for manipulation with the expression data
In bioDS/phyloRNA: Tools for phylogenetic analyses of scRNAseq data

Description Usage Arguments Details Value Functions

A group of functions, often lifted and modified from the Seurat package for manipulation with the 10X scRNAseq data.

expr_read10x(
  dir,
  gene_column = 2,
  unique_features = TRUE,
  strip_suffix = FALSE
)

expr_read10xh5(input, use_names = TRUE, unique_features = TRUE)

expr_normalize(data, scale_factor = 10000)

expr_scale(data)

expr_zero_to_na(data)

expr_quality_filter(data, minUMI = 500, minGene = 250, trim = TRUE)

expr_merge(datasets, names = NULL)

expr_discretize(data, intervals, unknown = "N")

`dir`	a directory with barcodes, features and sparse matrix
`gene_column`	optional the position of column with gene/feature names
`unique_features`	optional gene/feature names will be made unique to prevent possible name conflict
`strip_suffix`	optional the `-1` suffix which is common for 10X barcodes
`input`	an input data in the `.h5` format
`use_names`	optional use gene names instead of gene IDs
`data`	an expression matrix
`scale_factor`	optional a scaling factor
`minUMI`	minimum of UMI (unique molecules) per cell
`minGene`	minimum represented genes/features per cell
`trim`	optional trim empty genes after filtering
`datasets`	list of datasets to be merged
`names`	optional list of suffixes used to distinguish individual datasets
`intervals`	an interval vector describing interval borders, i.e., interval c(-1, 1) would describe half-open intervals: [-Inf -1), [-1, 1) and [1, Inf).
`unknown`	optional a character that represents unknown character

The Seurat package is a great tool for manipulation with the 10X scRNAseq expression data. However, it has two major issues. The first one is that it assumes that the zero expression is true zero. While this is reasonable assumption with a high coverage, small coverage scRNAseq can suffer from drop out due to the nature of a small amount of starting product and certain randomness coming from used methodology. This means that the measured zero level of expression is more accurately described as a missing data. Unfortunatelly, the sparse matrice implementation used by Seurat does not allow this change of context.

The second issue is the huge amount of dependencies that the Seurat brings. Due to the limited scope in which Seurat functionality is used and given that the utilized functionality had to be already rewritten due to the above reasons, it seems more convenient to just lift up remaining Seurat functionality.

sparse matrix

a list of sparse matrices

log-normalized matrix

rescaled and centered data

a dense matrix with NA instead of zeros

filtered matrix

merged datasets

descritized matrix

expr_read10x: Read 10X data
expr_read10xh5: Read 10X data in the .h5 format.
expr_normalize: Log-normalize data. Feature counts for each cell are divided by the total count for that cell multiplied by a scale factor. This is then natural log transformed using log1p.
expr_scale: Scale and center genes/features
expr_zero_to_na: Transform a sparse matrix into dense matrix where zeros are respresented as NA.
expr_quality_filter: Filter the expression matrix according to quality metrics
expr_merge: Merge multiple datasets
expr_discretize: Discretize expression matrix according to interval vector.