dot-filterCOsExtra: Filter out doublet cells and uninformative SNPs

.filterCOsExtraR Documentation

Filter out doublet cells and uninformative SNPs

Description

This function filter out cells that have been called too many crossovers due to diploid cell contamination or doublets. It also only keeps SNPs (rows) that ever contribute to a crossover interval. This function should be run for individual chromosomes and is called internaly by 'readHapState'

Usage

.filterCOsExtra(
  se,
  minSNP = 30,
  minlogllRatio = 200,
  minCellSNP = 200,
  bpDist = 100,
  maxRawCO = 10,
  biasTol = 0.45,
  nmad = 1.5
)

Arguments

se,

the SummarizedExperiment object that contains the called haplotype state matrix in the assay field and haplotype segment information in the metadata field.

minSNP,

the crossover(s) will be filtered out if introduced by a segment that has fewer than 'minSNP' SNPs to support.

minlogllRatio,

the crossover(s) will be filtered out if introduced by a segment that has lower than 'minlogllRatio' to its reversed state.

minCellSNP,

the minimum number of SNPs detected for a cell to be kept, used with 'nmads'

bpDist,

the crossover(s) will be filtered out if introduced by a segment that is shorter than 'bpDist' basepairs.

maxRawCO,

if a cell has more than 'maxRawCO' number of raw crossovers called across a chromosome, the cell is filtered out

biasTol,

the SNP's haplotype ratio across all cells is assumed to be 1:1. This argument can be used for removing SNPs that have a biased haplotype. i.e. almost always inferred to be haplotype state 1. It specifies a bias tolerance value, SNPs with haplotype ratios deviating from 0.5 smaller than this value are kept. Only effective when number of cells are larger than 10

nmad,

how many mean absolute deviations lower than the median number of SNPs per cellfor a cell to be considered as low coverage cell and filtered Only effective when number of cells are larger than 10. When effective, this or 'minCellSNP', whichever is larger, is applied

Details

The 'logllRatio' value is returned by 'sgcocaller' for each haplotype segment formed by consecutive SNPs that are called to have a same state. It is calculated by taking log of ratio (likelihood of SNPs with inferred states) and (likelihood of SNPs with reversed states)

Value

A 'RangedSummarizedExperment' object that have different dims with input. the colnames are the cell barcodes, rowRanges specify the location of SNPs that contribute to crossovers.

Author(s)

Ruqian Lyu


ruqianl/comapr documentation built on Oct. 27, 2023, 5:12 a.m.