getSamplePrevalence: Somatic mutations cellular prevalence on a sample set of...

Description Usage Arguments Value Examples

Description

This function computes the cellular prevalence of a list of somatic mutations of a tumor. The function applies OncoPhase linear model to a range of mutations located at a given genomic region or at the whole genome scale. It invokes the function getPrevalence to compute the cellular prevalence for each mutation of the set. When phasing information are available, the method can computes the prevalence of a somatic mutation relatively to phased germline SNP under the mode “PhasedSNP”. If the phasing information are not available the mode “SNVOnly” will be used to derive the cellular prevalence. as specified in getPrevalence.

Usage

1
2
3
4
5
getSamplePrevalence(input_df, mode = "Ultimate", nbFirstColumns = 0,
  region = NULL, detail = TRUE, LocusCoverage = FALSE,
  SomaticCountAdjust = FALSE, Optimal = TRUE,
  c2_max_residual_treshold = Inf, c1_ultimate_c2_replacing_treshold = 0,
  snvonly_max_treshold = 0.01, verbose = TRUE)

Arguments

input_df

A data frame containing for each mutation the following information (columns or fields) :

varcounts_snv

Allele counts supporting the SNV

refcounts_snv

Allele counts supporting the reference at the SNV locus

major_cn

Major copy number at the SNV locus

minor_cn

Minor copy number at the SNV locus

varcounts_snp

(Optional) Allele counts supporting the nearby phased SNP. Required if mode= PhasedSNP

refcounts_snp

(Optional) Allele counts supporting the reference at the nearby phased SNP. Required if mode= PhasedSNP

mode

The mode under which the prevalence is computed (Default : Ultimate , alternatives methods are PhasedSNP and SNVOnly). Can also be provided as a numeric 0=SNVOnly, 1= PhasedSNP, 2=Ultimate.

nbFirstColumns

Number of first columns in input_df to reproduce in the output dataframe e.g: Chrom, Pos, Vartype. Columns from nbFirstColumns +1 to the last column should contains the information needed for the prevalence computation.

region

The region of the genome to consider for the prevalence computation in the format chrom:start-end e.g "chr22:179800-98767.

detail

when set to TRUE, a detailed output is generated containing, the context and the detailed prevalence for each group of cells (germline cells, cells affected by one of the two genomic alterations SNV or copy number alteration and cells affected by both copy number alteration and SNV ). The residual and the linear models inputs and parameters are also reported.

LocusCoverage

when set to TRUE, the SNV locus coverage is estimated to the average coverage of the phased SNP and the variant allele fraction is the ratio of the variant allele count over the estimated locus coverage.

SomaticCountAdjust

when set to TRUE, varcounts_snv and refcounts_snv might be adjusted if necessary so that they meet the reqirements varcounts_snv <= varcounts_snp, refcounts_snv >= refcounts_snp and varcounts_snv + refcounts_snv ~ Poiss(varcounts_snp + refcounts_snp). Not used if mode=SNVOnly.

Optimal

The model will be run under different configurations of the parameters LocusCoverage and SomaticCountAdjust. The configuration yielding the optimal residual is then selected and returned.

c2_max_residual_treshold

Maximum residual threshold under which the context C2 can be inferred.

c1_ultimate_c2_replacing_treshold

Context C1 is inferred if its linear model residual is less than the specified threshold.

snvonly_max_treshold

Maximum threshold the linear model under SNVOnly is considered valid. Is the residual is greater than the value, then PhasedSNP is considered in case the phasing information are available.

NormalCellContamination

If provided, represents the rate of normal cells contaminations in the experiment.

Value

A data frame containing :

Column 1 to NbFirstcolumn of the input data frame input_df. This will generally include the chromosome and the position of the mutation plus any other columns to report in the prevalence dataframe (e.g REF and ALL sequences, ...)

and the following information

Prevalence

The Cellular Prevalence of the mutation

Germ

The proportion of cells with a normal genotype

Alt

The proportion of cells with only the CNA if the context C=C1 or with only the SNV if the context C=C2

Both

The proportion of cells with both the SNV and the SCNA

Context

Context at the mutation. If C1 then the SNV occurred after the SCNA, if C=c2 then the SNV occurred before the SCNA

solutionNorm

Residual of the linear model.

residualNorm

Constraints residual representing the sum of absolute values of solutionNorms of equalities and violated inequalities.

Quality

Quality of the prevalence calling. H if residual < 1e-05, F if residual < 1e-03 and L if residual > 1e-03

Alt_Prevalence

Prevalence estimated by the model if the context were to be the alternative context

Alt_solutionNorm

Residual of the linear modelunder the alternative context

Alt_residualNorm

Constraints residual representing the sum of absolute values of solutionNorms of equalities and violated inequalities under the alternative context

Mode

The mode considered for the cellular prevalence computation (either SNVOnly of PhasedSNP)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#Example 1:

input_file=system.file("extdata","phylogeny1_d300_n80.tsv", package = "OncoPhase")
input_df<-read.table(input_file,header=TRUE)
rownames(input_df) = input_df$mutation_id
print(input_df)
#  mut_id varcounts_snv refcounts_snv major_cn minor_cn varcounts_snp refcounts_snp
#a      a      151  152        1        1      151  135
#b      b      123  176        1        1      161  150
#c      c       94  209        2        1      176  134
#d      d       23  283        1        1      155  144
#e      e       60  228        2        0      174  125

prevalence_df=getSamplePrevalence(input_df,nbFirstColumns = 1)

print(prevalence_df)
#mutation_id Prevalence   Germ    Alt   Both Context solutionNorm residualNorm Quality Alt_Prevalence
#a           a     0.9967 0.0017 0.0017 0.9967      C1 7.718183e-32 2.220446e-16       H         0.9966
#b           b     0.8230 0.0890 0.0890 0.8230      C1 1.925930e-32 0.000000e+00       H         0.8200
#c           c     0.9000 0.1000 0.0000 0.9000      C1 5.238529e-32 2.220446e-16       H         0.7300
#d           d     0.1500 0.4200 0.4200 0.1500      C1 1.972152e-31 4.440892e-16       H         0.1500
#e           e     0.4200 0.2900 0.2900 0.4200      C1 5.007418e-32 2.220446e-16       H         0.7100
#Alt_solutionNorm Alt_residualNorm         InputValues    Mode      lm_inputs lm_params
#a     1.222984e-32     0.000000e+00 151:152:1:1:151:135 SNVOnly 151:152:1:1:C1  0.5:NA:2
#b     5.623715e-32     2.220446e-16 123:176:1:1:161:150 SNVOnly 123:176:1:1:C1 0.41:NA:2
#c     6.933348e-33     0.000000e+00  94:209:2:1:176:134 SNVOnly  94:209:2:1:C1 0.31:NA:3
#d     3.081488e-33     0.000000e+00  23:283:1:1:155:144 SNVOnly  23:283:1:1:C1 0.08:NA:2
#e     5.007418e-32     2.220446e-16  60:228:2:0:174:125 SNVOnly  60:228:2:0:C1 0.21:NA:2


#'@seealso \code{\link{getPrevalence}}

chedonat/OncoPhase documentation built on May 13, 2019, 3:39 p.m.