dndscv: dNdScv

View source: R/dndscv.R

dndscvR Documentation

dNdScv

Description

Analyses of selection using the dNdScv and dNdSloc models. Default parameters typically increase the performance of the method on cancer genomic studies. Default arguments use the GRCh37/hg19 version of the human genome. To run dNdScv on other assemblies or species see the buildref function and the dndscv_data GitHub repository.

Usage

dndscv(
  mutations,
  gene_list = NULL,
  refdb = "hg19",
  sm = "192r_3w",
  kc = "cgc81",
  cv = "hg19",
  max_muts_per_gene_per_sample = 3,
  max_coding_muts_per_sample = 3000,
  use_indel_sites = T,
  min_indels = 5,
  maxcovs = 20,
  constrain_wnon_wspl = T,
  outp = 3,
  numcode = 1,
  outmats = F,
  mingenecovs = 500,
  onesided = F,
  dc = NULL
)

Arguments

mutations

Table of mutations (5 columns: sampleID, chr, pos, ref, alt). Only list independent events as mutations.

gene_list

List of genes to restrict the analysis (use for targeted sequencing studies)

refdb

Reference database (path to .rda file or a pre-loaded array object in the right format)

sm

Substitution model (precomputed models are available in the data directory)

kc

List of a-priori known cancer genes (to be excluded from the indel background model)

cv

Covariates (a matrix of covariates -columns- for each gene -rows-) [default: reference covariates] [cv=NULL runs dndscv without covariates]

max_muts_per_gene_per_sample

If n<Inf, arbitrarily the first n mutations by chr position will be kept (default = 3, please set this to Inf to avoid filtering out any mutation)

max_coding_muts_per_sample

Hypermutator samples often reduce power to detect selection

use_indel_sites

Use unique indel sites instead of the total number of indels (default = TRUE, which tends to be more robust for typical cancer or somatic mutation datasets)

min_indels

Minimum number of indels required to run the indel recurrence module

maxcovs

Maximum number of covariates that will be considered (additional columns in the matrix of covariates will be excluded)

constrain_wnon_wspl

This constrains wnon==wspl in the dNdScv model (this typically leads to higher power to detect selection)

outp

Output: 1 = Global dN/dS values; 2 = Global dN/dS and dNdSloc; 3 = Global dN/dS, dNdSloc and dNdScv

numcode

NCBI genetic code number (default = 1; standard genetic code). To see the list of genetic codes supported use: ? seqinr::translate. Note that the same genetic code must be used in the dndscv and buildref functions.

outmats

Output the internal N and L matrices (default = F)

mingenecovs

Minimum number of genes required to run the negative binomial regression model with covariates (default = 500)

onesided

Option to run one-sided positive and negative selection tests per gene (default = FALSE). Note that one-sided tests are only performed for the wnon==wspl model, so using onesided=TRUE will overwrite constrain_wnon_wspl to TRUE.

dc

Duplex coverage per gene. Named Numeric Vector with values reflecting the mean duplex coverage per site per gene, and names corresponding to gene names. Use this argument only when running dNdScv on duplex sequencing data to use gene coverage in the offset of the regression model (default = NULL)

Details

Martincorena I, et al. (2017) Universal patterns of selection in cancer and somatic tissues. Cell. 171(5):1029-1041.

Value

'dndscv' returns a list of objects:

- globaldnds: Global dN/dS estimates across all genes.

- sel_cv: Gene-wise selection results using dNdScv.

- sel_loc: Gene-wise selection results using dNdSloc.

- annotmuts: Annotated coding mutations.

- genemuts: Observed and expected numbers of mutations per gene.

- geneindels: Observed and expected numbers of indels per gene.

- mle_submodel: MLEs of the substitution model.

- exclsamples: Samples excluded from the analysis.

- exclmuts: Coding mutations excluded from the analysis.

- nbreg: Negative binomial regression model for substitutions.

- nbregind: Negative binomial regression model for indels.

- poissmodel: Poisson regression model used to fit the substitution model and the global dNdS values.

- wrongmuts: Table of input mutations with a wrong annotation of the reference base (if any).

Author(s)

Inigo Martincorena (Wellcome Sanger Institute)


im3sanger/dndscv documentation built on Oct. 1, 2023, 1:05 p.m.