assignMutations: Mutation Assignment
In expands: Expanding Ploidy and Allele-Frequency on Nested Subpopulations

Description Usage Arguments Details Value Author(s) References See Also

Assigns mutations to previously predicted subpopulations.

1	assignMutations(dm, finalSPs, max_PM=6, cnvSPs=NULL, ploidy = 2, verbose = T)

`dm`	Matrix in which each row corresponds to a mutation. Has to contain at least the following column names: chr - the chromosome on which each mutation is located; startpos - the genomic position of each mutation; AF_Tumor - the allele-frequency of each mutation; PN_B - the count of the B-allele in normal (non-tumor) cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic).
`finalSPs`	Matrix in which each row corresponds to a subpopulation, as calculated by `clusterCellFrequencies`.
`max_PM`	Upper threshold for the number of amplicons per mutated cell. See also `cellfrequency_pdf`.
`cnvSPs`	Matrix in which each row corresponds to a subpopulation, as calculated by `clusterCellFrequencies`. If not set, finalSPs will be used to assign CNVs as well as SNVs.
`ploidy`	The background ploidy of the sequenced sample (default: 2). Changing the value of this parameter is not recommended. Dealing with cell lines or tumor biopsies of very high (>=0.95) tumor purity is a necessary but not sufficient condition to change the value of this parameter.
`verbose`	Give a more verbose output.

Each mutated locus l is assigned to the subpopulation C, whose size f_C can best explain the allele frequency (AF) and copy number (CN) observed at l. Four alternative cell frequency probabilities, P_x(f_C), are calculated for the SNV at locus l, with x denoting one of the four alternative evolutionary scenarios (see also cellfrequency_pdf).
The SNV is assigned to subpopulation:
C:=argmax_C (P_s(f_C), P_p(f_C), P_c(f_C), P_i(f_C)) (see cellfrequency_pdf).

The mutated loci assigned to each subpopulation cluster represent the genetic profile of each predicted subpopulation.
The assignment between subpopulation C and locus l only implies that the SNV at l has been first propagated during the clonal expansion that gave rise to C. So SNVs present in C may not be exclusive to C but may also be present in subpopulations smaller than C. Whether or not this is the case can sometimes be inferred from the phylogenetic structure of the subpopulation composition. See also buildPhylo.

A list with two fields:

dm

The input matrix with seven additional columns:
SP - subpopulation to which the point mutation has been assigned;
PM_B - count of the B-allele at the mutated genomic locus, in the assigned subpopulation (SP).
PM - total count of all alleles, in the assigned subpopulation (SP).
SP_cnv - if the point mutation lies within an amplified or deleted region: the subpopulation to which the copy number variation has been assigned. This entry has the same value as SP if and only if: i) the SNV and the CNV were propagated during the same clonal expansion or ii) the SNV lies within a copy neutral region.
PM_B_cnv - count of the B-allele, in the CNV harboring subpopulation (SP_cnv).
PM_cnv - total count of all alleles, in the CNV harboring subpopulation (SP_cnv).
%maxP - confidence of the assigned SP/SP_cnv scenario.