scandal_cna_infer: CNA inference

Description Usage Arguments Details Value Author(s) See Also

View source: R/cnv_inference.R

Description

This function infers CNAs (chromosomal copy-number variations) from the single-cell expression data. CNA inference is the main method of the scandal framework for classifying malignant and non-malignant cells.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
scandal_cna_infer(
  object,
  reference_cells,
  genome = "hg19",
  max_genes = 5000,
  expression_limits = c(-3, 3),
  window = 100,
  scaling_factor = 0.2,
  initial_centering = "col",
  base_metric = "median",
  verbose = FALSE
)

Arguments

object

a ScandalDataSet object.

reference_cells

a named vector of the cluster assignments of the reference cells. The names should correspond to the cell IDs of the reference (non-malignant) cells. The CNA matrix can be computed without a reference (with reference=NULL) but this is not recommended as downstream comoutations using the inferred CNA matrix will be less reliable.

genome

a string indicating the genome to be used for CNA inference. Must be one of the available genomes in the infercna package. Default is hg19.

max_genes

maximal number of genes to use for computing the CNA matrix. Default is 5000.

expression_limits

a numeric vector with two elements representing the upper and lower values with which to bound the centered expression matrix prior to calculating the CNA matrix. This blunts the effect of noisy genes. Defaut is (-3, 3).

window

number of genes to consider when calculating the running mean. Default is a window of 100 genes.

scaling_factor

a small constant by which to increase the calculated (-BM, +BM) interval to compensate for possible noise. Default is 0.2.

initial_centering

direction of centering the expression matrix (row-wise or col-wise) prior to computing the CNA matrix. Accepts either strings "row" or "col", default is "col".

base_metric

a metric to use for calculating the (-BM, + BM) interval. Accepts either strings "mean" or "median", default is "median".

verbose

suppresses all messages from this function. Default is FALSE.

Details

The CNA algorithm is as follows:
Preprocessing steps:

  1. Compute mean expression for each gene (log2[mean(TPM) + 1])

  2. Keep the max_genes highest expressed genes

  3. Order the rows (genes) of the expression matrix according to chromosomal position

  4. Log-transform the expression matrix

  5. Mean-center of the expression matrix in the initial_centering direction

  6. Bound the expression matrix according to the expression_limits


Value

Returns the ScandalDataSet object with CNA matrix in the "cna" element of the reducedDim slot (accessible by reducedDim(object, "cna")). Note that the matrix is stored with cell IDs as row names and gene IDs as column names.

Author(s)

Avishay Spitzer

See Also

The CNA inference method was defined and developed by **Dr. Itay Tirosh** during his time at the *Broad Institute* and published in several high-impact papers including the following paper from *Cell*: https://www.cell.com/cell/fulltext/S0092-8674(17)31270-9.


dravishays/scandal documentation built on Jan. 8, 2020, 1:30 p.m.