scandal_cna_infer: CNA inference
In dravishays/scandal: A framework for single-cell analysis

Description Usage Arguments Details Value Author(s) See Also

This function infers CNAs (chromosomal copy-number variations) from the single-cell expression data. CNA inference is the main method of the scandal framework for classifying malignant and non-malignant cells.

scandal_cna_infer(
  object,
  reference_cells,
  genome = "hg19",
  max_genes = 5000,
  expression_limits = c(-3, 3),
  window = 100,
  scaling_factor = 0.2,
  initial_centering = "col",
  base_metric = "median",
  verbose = FALSE
)

`object`	a ScandalDataSet object.
`reference_cells`	a named vector of the cluster assignments of the reference cells. The names should correspond to the cell IDs of the reference (non-malignant) cells. The CNA matrix can be computed without a reference (with `reference=NULL`) but this is not recommended as downstream comoutations using the inferred CNA matrix will be less reliable.
`genome`	a string indicating the genome to be used for CNA inference. Must be one of the available genomes in the infercna package. Default is hg19.
`max_genes`	maximal number of genes to use for computing the CNA matrix. Default is 5000.
`expression_limits`	a numeric vector with two elements representing the upper and lower values with which to bound the centered expression matrix prior to calculating the CNA matrix. This blunts the effect of noisy genes. Defaut is (-3, 3).
`window`	number of genes to consider when calculating the running mean. Default is a window of 100 genes.
`scaling_factor`	a small constant by which to increase the calculated (-BM, +BM) interval to compensate for possible noise. Default is 0.2.
`initial_centering`	direction of centering the expression matrix (row-wise or col-wise) prior to computing the CNA matrix. Accepts either strings "row" or "col", default is "col".
`base_metric`	a metric to use for calculating the (-BM, + BM) interval. Accepts either strings "mean" or "median", default is "median".
`verbose`	suppresses all messages from this function. Default is FALSE.

The CNA algorithm is as follows:
Preprocessing steps:

Compute mean expression for each gene (log2[mean(TPM) + 1])
Keep the max_genes highest expressed genes
Order the rows (genes) of the expression matrix according to chromosomal position
Log-transform the expression matrix
Mean-center of the expression matrix in the initial_centering direction
Bound the expression matrix according to the expression_limits

Returns the ScandalDataSet object with CNA matrix in the "cna" element of the reducedDim slot (accessible by reducedDim(object, "cna")). Note that the matrix is stored with cell IDs as row names and gene IDs as column names.

Avishay Spitzer

The CNA inference method was defined and developed by **Dr. Itay Tirosh** during his time at the *Broad Institute* and published in several high-impact papers including the following paper from *Cell*: https://www.cell.com/cell/fulltext/S0092-8674(17)31270-9.

dravishays/scandal documentation built on Jan. 8, 2020, 1:30 p.m.