cellfrequency_pdf: Computes the probability distribution of cellular frequencies...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/cellfrequency_pdf.R

Description

Calculates P - the probability density distribution of cellular frequencies for one single point mutation or CNV. For each cell-frequency f, the value of P(f) reflects the probability that the mutation is present in a fraction f of cells.

Usage

1
cellfrequency_pdf(af, cnv, pnb, freq, max_PM=6, ploidy = 2, enforceCoocurrence=T)

Arguments

af

The allelic frequency at which the point mutation has been observed.

cnv

The average copy number of the locus in which the mutation is embedded.

pnb

The count of the B-allele in normal cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic). B-alleles that have >1 copy in normal cells are not modeled.

freq

Vector of cellular frequencies at which the probabilities will be calculated.

max_PM

Upper threshold for the number of amplicons per mutated cell. max\_PM is the maximum number of amplicons above which solutions are rejected in the cell-frequency estimation step described below, i.e. PM <= max\_PM. The choice of max\_PM should depend on genomic depth of coverage and on the fraction of the genome sequenced: the higher the quality and abundance of data, the higher max\_PM.

ploidy

The background ploidy of the sequenced sample (default: 2). Changing the value of this parameter is not recommended. Dealing with cell lines or tumor biopsies of very high (>=0.95) tumor purity is a necessary but not sufficient condition to change the value of this parameter.

enforceCoocurrence

Whether or not to enforce assumption that overlapping SNV and CNV were co-propagated as part of the same clonal expansion.

Details

We consider two types of molecular mechanisms that convert a locus into its mutated state: copy number variation (CNV) inducing events and single nucleotide variation (SNV) inducing events. We assume that a normal state is defined by a total allele count of two and B allele count below two, whereas a mutated state has an increased fraction of B alleles. The conditions defining these states for each locus l are as follows:
i) PM_B, PN_B, PM, PN \in N; ii) PM_B ≥q 1; PN_B ≤q 1; PN = 2; iii) \frac{PM_B}{PM} ≥q \frac{PN_B}{PN} .

PM_B and PN_B denote the count of the B allele in each cell type: mutated cells and normal cells, respectively. The value of PN_B is one if l has a germline variant, zero otherwise. PM, PN are the total allele count of mutated cells and normal cells. PM is required to be between one and max\_PM, that is, we exclude solutions for which the maximum number of amplicons per cell exceeds the user defined value of max\_PM.
The function returns the probability distribution, P(f), that the mutation at locus l is present in a fraction f of cells, where f \in [0,1].

Four alternative cell frequency probability distribution scenarios, P(f), can be obtained for each allele-frequency + copy number pair (AF, CN). For each scenario, model starts with a germline population that will be the root of all other modeled subpopulations. First subpopulation (f_{cnv}) modeled to evolve from the germline population is always the one carrying a CNV:
pm * f_{cnv} + PN *(1-f_{cnv}) = CN, where pm is the total allele count of f_{cnv}.
A subsequent subpopulation (f_{snv}) is always defined by an SNV and is modeled in relation to f_{cnv}, either as:
1. P_s(f) - its sibling: PM_B * f_{snv} + PN_B *(1-f_{snv}) = AF*CN, where f_{snv}+f_{cnv}<=1; PM_B<=2.
2. P_p(f) - its parent: PM_B * (f_{snv}-f_{cnv}) + pm_B * f_{cnv} + PN_B *(1-f_{snv}) = AF*CN, where f_{snv}>f_{cnv}; PM_B<=2 and pm_B is the B-allele count of f_{cnv}.
3. P_c(f) - its child: PM_B * f_{snv} + PN_B *(1-f_{snv}) = AF*CN, where f_{snv}<f_{cnv}; PM_B<=pm.
4. P_i(f) - itself: PM_B * f + PN_B *(1-f) = AF*CN, where f=f_{snv}=f_{cnv}; PM_B<=pm.

Under 1), SNV and CNV are completely independent as they are never co-propagated during the same clonal expansion. Under 2) and 3), SNV and CNV are partially dependent, yet present in two distinct subpopulations. Under 4), both the SNV and an CNV at l were propagated during the same clonal expansion.

Value

List with four components:

p

The probability that the point mutation/CNV is present in a fraction f of cells, for each input frequency f in parameter freq.

bestF

The cellular frequency that best explains the observed allele frequency and/or copy number.

Author(s)

Noemi Andor

References

Noemi Andor, Julie Harness, Sabine Mueller, Hans Werner Mewes and Claudia Petritsch. (2013) ExPANdS: Expanding Ploidy and Allele Frequency on Nested Subpopulations. Bioinformatics.

Examples

1
2
3
freq=seq(0.1,1.0,by=0.01);
cfd=cellfrequency_pdf(af=0.26,cnv=1.95,pnb=0,freq=freq, max_PM=6)
plot(freq,cfd$p,type="l",xlab="f",ylab="P(f)");

noemiandor/expands documentation built on Sept. 13, 2021, 6:25 p.m.