clusterCellFrequencies: Clustering of cellular frequency probability distributions

Description Usage Arguments Details Value Author(s) References

View source: R/clusterCellFrequencies.R

Description

Calculates overrepresented cell frequencies using a two-step approach. Based on the assumption that passenger mutations occur within a cell prior to the driver event that initiates the expansion, each clonal expansion should be marked by multiple mutations. Thus mutations and copy number variations that took place in a cell prior to a clonal expansion should be present in a similar fraction of cells and leave a similar "frequency-trace" during their propagation.

Usage

1
clusterCellFrequencies(densities, p, nrep=30, min_CF=0.1, verbose = T)

Arguments

densities

Matrix as obtained by computeCellFrequencyDistributions.Each row corresponds to a mutation and each column corresponds to a cellular frequency. Each value densities[i,j] represents the probability that mutation i is present in a fraction f of cells, where f is given by: colnames(densities[,j]).

p

Precision with which subpopulation size is predicted, a small value reflects a high resolution and can lead to a higher number of predicted subpopulations.

nrep

Positive integer indicating the number of algorithm repetitions (default: 30).

min_CF

Lower threshold for the prevalence of a mutated cell (default: 0.1).

verbose

Give a more verbose output.

Details

In the first step, mutations with similar cellular frequencies are grouped together by hierarchical cluster analysis of the probability distributions using the Kullback-Leibler divergence as a distance measure. The cell frequency at each cluster-maxima denotes the size of the subpopulation that harbors the clustered mutations. In the second step, each cluster is extended by members with similar distributions in an interval around the cluster-maxima.

Value

SPs

Matrix of predicted subpopulations. Each row corresponds to a subpopulation and each column contains information about that subpopulation, such as the size in the sequenced tumor bulk (column Mean Weighted) and the noise score at which the subpopulation has been detected (column score: lower values ~ higher subpopulation detection confidence).

Author(s)

Noemi Andor

References

Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.


expands documentation built on Sept. 5, 2021, 5:18 p.m.