auto_cna_call: Automated pipeline to call CNA

View source: R/auto_cna_call.R

auto_cna_callR Documentation

Automated pipeline to call CNA

Description

Automated pipeline to call CNA using metacells.

Usage

auto_cna_call(
  ge_df,
  comm_df,
  nb_metacells = 10,
  metacell_size = 3,
  multisamps = TRUE,
  trans_prob = 0.1,
  baseline_cells = NULL,
  baseline_communities = NULL,
  prefix = "scCNAutils_out",
  nb_cores = 1,
  chrs = c(1:22, "X", "Y"),
  bin_mean_exp = 3,
  z_wins_th = 3,
  smooth_wsize = 3,
  rcpp = TRUE
)

Arguments

ge_df

normalized gene expression of all cells (e.g. output from norm_ge.

comm_df

a data.frame with community information, output from find_communities.

nb_metacells

the number of metacells per comunity.

metacell_size

the number of cells in a metacell.

multisamps

use the multi-sample version of the HMM segmentation? Default is TRUE. See details.

trans_prob

the transition probability for the HMM.

baseline_cells

cells to use as baseline.

baseline_communities

communities to use as baseline. Used if baseline.cells is NULL.

prefix

the prefix to use for the files created by this function (e.g. graphs).

nb_cores

the number of processors to use.

chrs

the chromosome names to keep. NULL to include all the chromosomes.

bin_mean_exp

the desired minimum mean expression in the bin.

z_wins_th

the threshold to winsorize Z-score. Default is 3

smooth_wsize

the window size for smoothing. Default is 3.

rcpp

use Rcpp function. Default is TRUE. More memory-efficient and faster when running on one core.

Details

Once the metacells are created there are two ways to call CNA. First, if multisamps=FALSE, to call CNA on each metacell and merge the result per community, keeping the information about how many metacell support the CNA. Second, if multisamps=TRUE (default), to run the HMM on all the metacells for a community. The multi-sample approach should be more robust.

The transition probability (trans_prob) is going to affect the HMM segmentation. Smaller values will create longer segments. One approach, often advocated by HMM aficionados, is to try different values and use the ones that gives the best results, for example based on the QC graphs (TODO). Another approach is to use a loose transition probability and then filter short segments ('length' column or 'pass.filter' column).

Value

a data.frame with CNAs

Author(s)

Jean Monlong


jmonlong/scCNAutils documentation built on May 3, 2022, 4:34 a.m.