assignMutations: Mutation Assignment

Description Usage Arguments Details Value Author(s) References See Also

View source: R/assignMutations.R

Description

Assigns mutations to previously predicted subpopulations.

Usage

1
assignMutations(dm, finalSPs, max_PM=6, cnvSPs=NULL, ploidy = 2, verbose = T)

Arguments

dm

Matrix in which each row corresponds to a mutation. Has to contain at least the following column names:
chr - the chromosome on which each mutation is located;
startpos - the genomic position of each mutation;
AF_Tumor - the allele-frequency of each mutation;
PN_B - the count of the B-allele in normal (non-tumor) cells (binary variable: 1 if the mutation is a germline variant, 0 if somatic).

finalSPs

Matrix in which each row corresponds to a subpopulation, as calculated by clusterCellFrequencies.

max_PM

Upper threshold for the number of amplicons per mutated cell. See also cellfrequency_pdf.

cnvSPs

Matrix in which each row corresponds to a subpopulation, as calculated by clusterCellFrequencies. If not set, finalSPs will be used to assign CNVs as well as SNVs.

ploidy

The background ploidy of the sequenced sample (default: 2). Changing the value of this parameter is not recommended. Dealing with cell lines or tumor biopsies of very high (>=0.95) tumor purity is a necessary but not sufficient condition to change the value of this parameter.

verbose

Give a more verbose output.

Details

Each mutated locus l is assigned to the subpopulation C, whose size f_C can best explain the allele frequency (AF) and copy number (CN) observed at l. Four alternative cell frequency probabilities, P_x(f_C), are calculated for the SNV at locus l, with x denoting one of the four alternative evolutionary scenarios (see also cellfrequency_pdf).
The SNV is assigned to subpopulation:
C:=argmax_C (P_s(f_C), P_p(f_C), P_c(f_C), P_i(f_C)) (see cellfrequency_pdf).

The mutated loci assigned to each subpopulation cluster represent the genetic profile of each predicted subpopulation.
The assignment between subpopulation C and locus l only implies that the SNV at l has been first propagated during the clonal expansion that gave rise to C. So SNVs present in C may not be exclusive to C but may also be present in subpopulations smaller than C. Whether or not this is the case can sometimes be inferred from the phylogenetic structure of the subpopulation composition. See also buildPhylo.

Value

A list with two fields:

dm

The input matrix with seven additional columns:
SP - subpopulation to which the point mutation has been assigned;
PM_B - count of the B-allele at the mutated genomic locus, in the assigned subpopulation (SP).
PM - total count of all alleles, in the assigned subpopulation (SP).
SP_cnv - if the point mutation lies within an amplified or deleted region: the subpopulation to which the copy number variation has been assigned. This entry has the same value as SP if and only if: i) the SNV and the CNV were propagated during the same clonal expansion or ii) the SNV lies within a copy neutral region.
PM_B_cnv - count of the B-allele, in the CNV harboring subpopulation (SP_cnv).
PM_cnv - total count of all alleles, in the CNV harboring subpopulation (SP_cnv).
%maxP - confidence of the assigned SP/SP_cnv scenario.

finalSPs

The input matrix of subpopulations with column nMutations updated according to the total number of mutations assigned to each subpopulation.

Author(s)

Noemi Andor

References

Li, B. & Li, J. Z (2014). A general framework for analyzing tumor subclonality using SNP array and DNA sequencing data. Genome Biol.

See Also

clusterCellFrequencies


expands documentation built on Sept. 5, 2021, 5:18 p.m.