preprocess_matrix: Matrix preprocessing

Description Usage Arguments Details Value Note Author(s) See Also

View source: R/preprocess.R

Description

Performs preprocessing of a matrix object including breaking up the dataset into objects representing the specific samples, filtering out low quality cells and lowly expressed genes and log transforming.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
preprocess_matrix(
  x,
  complexity_cutoff,
  expression_cutoff,
  housekeeping_cutoff,
  log_base,
  scaling_factor,
  pseudo_count,
  sample_id = NULL,
  cell_ids = NULL,
  gene_ids = NULL,
  forced_genes_set = NULL,
  use_housekeeping_filter = FALSE,
  verbose = FALSE
)

Arguments

x

a numeric matrix.

complexity_cutoff

a numeric vector of length 2 representing the lower and upper bounds of complexity (i.e. the number of detected genes per cell).

expression_cutoff

a numeric representing the minimal log2 mean expression per gene below which a gene is considered lowly expressed.

housekeeping_cutoff

a numeric representing the log2 mean expression of house-keeping genes (i.e. genes that are highly expressed in all cells) per cell below which a cell is considered low quality.

log_base

a numeric representing the logarithm base for performing log transformation on the data.

scaling_factor

a numeric representing a scaling factor by which to divide each data point before log transformation.

pseudo_count

a numeric representing the pseudo count added when performing log transformation to avoid taking the log of zero.

sample_id

an ID of the sample being processed. Used only for printing and hence is not a mandatory parameter. Default is NULL.

cell_ids

a charactyer vector containing IDs of cells that already passed QC. enables bypassing the low-quality cells filtering step.

gene_ids

a charactyer vector containing IDs of genes that already passed QC. enables bypassing the lowly-expressed genes filtering step.

forced_genes_set

a vector of genes that should be included in the final processed object even if their expression is low with the exception of forced genes with absolute count equals to zero which will be filtered out. Default is NULL.

use_housekeeping_filter

should cells with low expression of housekeeping genes should be filtered out. Default is FALSE.

verbose

suppresses all messages from this function. Default is FALSE.

Details

This function is performs the matrix preprocessing steps and is used for preprocessing ScandalDataSet objects to allow downstream analysis.

The main steps of preprocessing are as follows:

  1. Filtering out low quality cells (cells with low complexity) by summing-up for each cell (column) the number of genes with count greater than zero and removing the cells outside the complexity cutoff range configured for the specific sample in complexity_cutoff.

  2. A possible step of filtering out cells with low expression of house-keeping genes, i.e. genes that are normally highly expressed in most cells (for example, genes that encode ribosomes). Cells with mean expression of HK genes less than the housekeeping cutoff configured for the specific sample in housekeeping_cutoff will be removed.

  3. Filtering out lowly expressed genes i.e. genes with log2 mean expression less than the expression cutoff range configured for the specific sample in expression_cutoff.

  4. Log-transforming the expression data.

Value

A processed matrix ready for downstream analysis.

Note

The function assumes that each column represents a cell and each row represents a gene.

Author(s)

Avishay Spitzer

See Also

scandal_preprocess


dravishays/scandal documentation built on Jan. 8, 2020, 1:30 p.m.