sitednds: sitednds

View source: R/sitednds.R

sitedndsR Documentation

sitednds

Description

Function to estimate site-wise dN/dS values and p-values against neutrality. To generate a valid input object for this function, use outmats=T when running dndscv. This function is in testing, please interpret the results with caution. Also note that recurrent artefacts or SNP contamination can violate the null model and dominate the list of sites under apparent selection. A considerable number of significant synonymous sites may reflect a problem with the data. Be very critical of the results and if suspicious sites appear recurrently mutated consider refining the variant calling (e.g. using a better unmatched normal panel). In the future, this function may be extended to perform inferences at a codon level instead of at a single-base level.

Usage

sitednds(
  dndsout,
  min_recurr = 2,
  gene_list = NULL,
  site_list = NULL,
  trinuc_list = NULL,
  theta_option = "conservative",
  syn_drivers = "TP53:T125T",
  method = "NB",
  numbins = 10000,
  kc = "cgc81"
)

Arguments

dndsout

Output object from dndscv. To generate a valid input object for this function, use outmats=T when running dndscv.

min_recurr

Minimum number of mutations per site to estimate site-wise dN/dS ratios. [default=2]

gene_list

List of genes to restrict the p-value and q-value calculations (Restricted Hypothesis Testing). Note that q-values are only valid if the list of genes is decided a priori. [default=NULL, sitednds will be run on all genes in dndsout]

site_list

List of hotspot sites to restrict the p-value and q-value calculations (Restricted Hypothesis Testing). Note that q-values are only valid if the list of sites is decided a priori. [default=NULL, sitednds will be run on all genes in dndsout]

trinuc_list

List of trinucleotide substitution to restrict the analysis of sitednds. This is used to estimate separate overdispersion parameters for different substitution contexts [default=NULL, sitednds will be run on all substitution contexts]

theta_option

2 options: "mle" (uses the MLE of the overdispersion parameter) or "conservative" (uses the conservative bound of the CI95). Values other than "mle" will lead to the conservative option [default="conservative"]

syn_drivers

Vector with a list of known synonymous driver mutations to exclude from the background model [default="TP53:T125T"]. See Martincorena et al., Cell, 2017 (PMID:29056346).

method

Overdispersion model: NB = Negative Binomial (Gamma-Poisson), LNP = Poisson-Lognormal (see Hess et al., BiorXiv, 2019). [default="NB"]

numbins

Number of bins to discretise the rvec vector [default=1e4]. This enables fast execution of the LNP model in datasets of arbitrarily any size.

kc

List of a-priori known cancer genes (to be excluded when fitting the background model)

Value

'sitednds' returns a table of recurrently mutated sites and the estimates of the size parameter:

- recursites: Table of recurrently mutated sites with site-wise dN/dS values and p-values

- overdisp: Maximum likelihood estimate and CI95

- fpr_nonsyn_q05: Fraction of the significant non-synonymous sites (qval<0.05) that are estimated to be false positives. This assumes that all synonymous mutations (except those in TP53 and CDKN2A) are false positives, thus offering a conservative estimate of the false positive rate.

- LL: Log-likelihood of the fit of the overdispersed model (see "method" argument) to all synonymous sites.

Author(s)

Inigo Martincorena (Wellcome Sanger Institute)


im3sanger/dndscv documentation built on Oct. 1, 2023, 1:05 p.m.