auto_cna_call: Automated pipeline to call CNA
In jmonlong/scCNAutils: Functions to analyze copy number aberrations in single-cell data

auto_cna_call

R Documentation

Automated pipeline to call CNA

Description

Automated pipeline to call CNA using metacells.

Usage

auto_cna_call(
  ge_df,
  comm_df,
  nb_metacells = 10,
  metacell_size = 3,
  multisamps = TRUE,
  trans_prob = 0.1,
  baseline_cells = NULL,
  baseline_communities = NULL,
  prefix = "scCNAutils_out",
  nb_cores = 1,
  chrs = c(1:22, "X", "Y"),
  bin_mean_exp = 3,
  z_wins_th = 3,
  smooth_wsize = 3,
  rcpp = TRUE
)

Arguments

`ge_df`	normalized gene expression of all cells (e.g. output from `norm_ge`.
`comm_df`	a data.frame with community information, output from `find_communities`.
`nb_metacells`	the number of metacells per comunity.
`metacell_size`	the number of cells in a metacell.
`multisamps`	use the multi-sample version of the HMM segmentation? Default is TRUE. See details.
`trans_prob`	the transition probability for the HMM.
`baseline_cells`	cells to use as baseline.
`baseline_communities`	communities to use as baseline. Used if baseline.cells is NULL.
`prefix`	the prefix to use for the files created by this function (e.g. graphs).
`nb_cores`	the number of processors to use.
`chrs`	the chromosome names to keep. NULL to include all the chromosomes.
`bin_mean_exp`	the desired minimum mean expression in the bin.
`z_wins_th`	the threshold to winsorize Z-score. Default is 3
`smooth_wsize`	the window size for smoothing. Default is 3.
`rcpp`	use Rcpp function. Default is TRUE. More memory-efficient and faster when running on one core.

Details

Once the metacells are created there are two ways to call CNA. First, if multisamps=FALSE, to call CNA on each metacell and merge the result per community, keeping the information about how many metacell support the CNA. Second, if multisamps=TRUE (default), to run the HMM on all the metacells for a community. The multi-sample approach should be more robust.

The transition probability (trans_prob) is going to affect the HMM segmentation. Smaller values will create longer segments. One approach, often advocated by HMM aficionados, is to try different values and use the ones that gives the best results, for example based on the QC graphs (TODO). Another approach is to use a loose transition probability and then filter short segments ('length' column or 'pass.filter' column).