View source: R/withingenednds.R
withingenednds | R Documentation |
This function uses Poisson and Negative Binomial regression models at single-site level to study selection across different regions (coding and non-coding) within a gene.
withingenednds(
mutations,
gene,
covtable,
dndsout,
genomeFile,
regionschr = NULL,
regionsaa = NULL,
fixtheta = NULL,
normalisefromsyn = TRUE,
syndrivers = NULL,
exon_flank_length = 10,
intron_flank_length = 10,
sitefilename = NULL,
refdb = "hg19",
numcode = 1
)
mutations |
Data frame with all the mutations detected in the study (5-column input table as for dndscv: sampleID, chr, pos, ref, mut). |
gene |
Name of the gene of interest. This function is currently designed to work on a single gene, but combined analyses of multiple genes could be done using the sites output table generated by this function. |
covtable |
Table with all sites of interest in the gene. This should be a data frame with one row per site and the following columns: chr, pos, dc (duplex depth). Additional columns will not be used. |
dndsout |
dndscv output object for the dataset. This is mainly used for the MLEs of the substitution model. Running dndscv on all genes in the dataset is recommended unless the gene of interest is believed to have a different substitution model. |
genomeFile |
Path to a reference fasta file for the genome assembly. |
regionschr |
Optional data frame with user-defined regions of interest in the gene. This allows the user to define arbitrary regions within a gene (coding or non-coding) from which to calculate omega (selection or obs/exp) values (e.g. protein domains, splicing regulator regions, core promoters, etc). The table should contain the following columns: chr, start, end, wname (a unique name for the w parameter, e.g. wdomain1, wcorepromoter), impacts (e.g. Missense or Missense|Nonsense will restrict the w calculation with Missense or Missense and Nonsense mutations in the region, respectively), layered (1/0; using "0" removes other w parameters influencing the site, whereas using "1" models selection as relative to other w parameters active at these sites). |
regionsaa |
Optional data frame with user-defined regions of interest in the gene, using aminoacid coordinates. The table should contain the following columns: gene, aa_start, aa_end, w feature name (e.g. wdomain1), impacts. |
fixtheta |
Pre-calculated overdispersion (theta) parameter. This should be calculated using sitednds(., method="NB"). |
normalisefromsyn |
Normalise the substitution rates based on the synonymous mutations in the gene. Using TRUE is recommended. Using FALSE uses the expected synonymous mutation rate of the gene from the dndscv negative binomial regression model (dndsout$genemuts). |
syndrivers |
Vector of known synonymous driver sites defined by their aminoacid position, to be excluded from the background model (e.g. syndrivers = c("T125T","E224E","Q331Q") for TP53). |
exon_flank_length |
Exon flank length in bp [default = 10]. Using a value higher than 0 will calculate a separate selection (w) coefficient for synonymous mutations in exon flanks. |
intron_flank_length |
Intron flank length in bp [default = 10]. Intronic sites occurring within these flanks but not already classified as Essential_Splice will receive a separate w parameter. |
sitefilename |
Optionally, provide a file name to save the table of all annotated sites in the gene. This table is also always contained in the output object. |
refdb |
Reference database (path to .rda file or a pre-loaded array object in the right format). |
numcode |
NCBI genetic code number (default = 1; standard genetic code). To see the list of genetic codes supported use: ? seqinr::translate |
Martincorena I, et al. (2017) Universal patterns of selection in cancer and somatic tissues. Cell. 171(5):1029-1041.
'withingenednds' returns a list of objects:
- sites: Table with the annotation of all sites in the gene (from covtable), including all functional annotations in the "regions" input object as well as default annotations (Missense, Nonsense, Essential_Splice, Start_loss, Stop_loss, etc).
- par.pois: Poisson regression results (not recommended).
- par.nb: Negative binomial results fitting a new overdispersion parameter to the data (when fixtheta is not provided).
- par.nbfix: Negative binomial results using the input fixtheta value as recommended.
- model.pois: Poisson regression object.
- model.nb: Negative binomial regression object.
- model.nbfix: Negative binomial regression object.
Inigo Martincorena (Wellcome Sanger Institute)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.