expr: Functions for manipulation with the expression data

Description Usage Arguments Details Value Functions

Description

A group of functions, often lifted and modified from the Seurat package for manipulation with the 10X scRNAseq data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
expr_read10x(
  dir,
  gene_column = 2,
  unique_features = TRUE,
  strip_suffix = FALSE
)

expr_read10xh5(input, use_names = TRUE, unique_features = TRUE)

expr_normalize(data, scale_factor = 10000)

expr_scale(data)

expr_zero_to_na(data)

expr_quality_filter(data, minUMI = 500, minGene = 250, trim = TRUE)

expr_merge(datasets, names = NULL)

expr_discretize(data, intervals, unknown = "N")

Arguments

dir

a directory with barcodes, features and sparse matrix

gene_column

optional the position of column with gene/feature names

unique_features

optional gene/feature names will be made unique to prevent possible name conflict

strip_suffix

optional the -1 suffix which is common for 10X barcodes

input

an input data in the .h5 format

use_names

optional use gene names instead of gene IDs

data

an expression matrix

scale_factor

optional a scaling factor

minUMI

minimum of UMI (unique molecules) per cell

minGene

minimum represented genes/features per cell

trim

optional trim empty genes after filtering

datasets

list of datasets to be merged

names

optional list of suffixes used to distinguish individual datasets

intervals

an interval vector describing interval borders, i.e., interval c(-1, 1) would describe half-open intervals: [-Inf -1), [-1, 1) and [1, Inf).

unknown

optional a character that represents unknown character

Details

The Seurat package is a great tool for manipulation with the 10X scRNAseq expression data. However, it has two major issues. The first one is that it assumes that the zero expression is true zero. While this is reasonable assumption with a high coverage, small coverage scRNAseq can suffer from drop out due to the nature of a small amount of starting product and certain randomness coming from used methodology. This means that the measured zero level of expression is more accurately described as a missing data. Unfortunatelly, the sparse matrice implementation used by Seurat does not allow this change of context.

The second issue is the huge amount of dependencies that the Seurat brings. Due to the limited scope in which Seurat functionality is used and given that the utilized functionality had to be already rewritten due to the above reasons, it seems more convenient to just lift up remaining Seurat functionality.

Value

sparse matrix

a list of sparse matrices

log-normalized matrix

rescaled and centered data

a dense matrix with NA instead of zeros

filtered matrix

merged datasets

descritized matrix

Functions


bioDS/phyloRNA documentation built on Feb. 21, 2022, 3:28 p.m.