coselens | R Documentation |
Coselens identifies genes that are differentially selected depending on the grouping variable and provides maximum likelihood estimates of the effect sizes. Given an input of mutations from two groups of patients, dN/dS is calculated in each of the groups, and the excess of non-synonymous substitutions between them is found as an estimate for the difference in the number of driver mutations. For example, one group can be mutations from individuals in a certain cancer type with mutations in a specific gene vs those without mutations in the specific gene. A p value is returned for differential selection in each gene in the reference genome (default: hg19).
coselens(
group1,
group2,
subset.genes.by = NULL,
sequenced.genes = NULL,
refdb = "hg19",
sm = "192r_3w",
kc = "cgc81",
cv = "hg19",
max_muts_per_gene_per_sample = Inf,
max_coding_muts_per_sample = 3000,
use_indel_sites = T,
min_indels = 5,
maxcovs = 20,
constrain_wnon_wspl = T,
outp = 3,
numcode = 1,
mingenecovs = 500
)
group1 |
group of samples (for example those that are subject to a given condition) |
group2 |
another group of samples that are NOT subject to that condition (the control group) |
subset.genes.by |
genes to subset results by |
sequenced.genes |
the gene_list paramater from dndscv, which is a list of genes to restrict the analysis (use for targeted sequencing studies) |
... |
other parameters passed to dncdscv, all defaults from dndscv are used except max_muts_per_gene_per_sample is set to Infinity |
Iranzo J, Gruenhagen G, Calle-Espinosa J, Koonin EV (2022) Pervasive conditional selection of driver mutations and modular epistasis networks in cancer. Cell Reports. 40(8):111272.
coselens returns a list containing six objects: 1) "substitutions", a summary table of gene-level conditional selection for single-nucleotide substitutions (including missense, nonsense, and essential splice site mutations); 2) "indels", same for small indels; (3) "missense_sub", same for missense substitutions only; (4) "truncating_sub", same for truncating substitutions only (nonsense and essential splice site mutations); (5) "overall_mut", a summary table with the combined analysis of single-nucleotide substitutions and indels; and (6) "dndscv", a list of objects with the complete output of (non-conditional) selection analyses separately run on each group of samples, as provided by the dndscv package. The first table should be sufficient for most users. If interested in indels, please note that the indel analysis uses a different null model that makes the test for conditional selection notably less sensitive than in the case of substitutions. Such lower sensitivity also extends to the "overall_mut" table. The dataframes (1-5) contain the following:
gene_name: name of the gene that was tested for conditional selection
num.driver.group1: estimate of the number of drivers per sample per gene in group 1
num.driver.group2: estimate of the number of drivers per sample per gene in group 2
Delta.Nd: absolute difference in the average number of driver mutations per sample (group 1 minus group 2)
classification: classification of conditional selection. The most frequent classes are strict dependence (drivers only in group 1), facilitation (drivers more frequent in group 1), independence, inhibition (drivers less frequent in group 1), and strict inhibition (drivers absent from group 1). If negative selection is present, other possibilities are strict dependence with sign change (drivers positively selected in group 1 but negatively selected in group 2), strict inhibition with sign change (drivers positively selected in group 2 but negatively selected in group 1), aggravation (purifying selection against mutations becomes stronger in group 1), and relaxation (purifying selection against mutations becomes weaker in group 1).
dependency: dependency index, measuring the association between the grouping variable (group 1 or 2) and the average number of drivers observed in a gene. It serves as a quantitative measure of the qualitative effect described in "classification". In the most common cases, a value of 1 indicates strict dependence or inhibition (drivers only observed in one group) and a value of 0 (or NA) indicates independence.
pval: p-value for conditional selection
qval: q-value for conditional selection using Benjamini-Hochberg correction of false discovery rate.
The "dndscv" list contains two objects. Please, read the documentation of the dndscvpackage for further information about th
dndscv_group1: output of dndscv for group 1
dndscv_group2: output of dndscv for group 2
George Gruehnagen (Georgia Institute of Technology) and Jaime Iranzo (Centro de Biotecnologia y Genomica de Plantas - Universidad Politcnica de Madrid)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.