test_codiu_genes | R Documentation |
Pairwise statistical testing of co-Differential Isoform Usage
relationships. For a selected set of gene pairs showing co-expression of
isoforms across clusters (see find_codiu_genes
), this
function tests the significance of the detected co-DIU patterns.
Warning: this function may take a long time to run, especially if applied
to all pairs of co-DIU genes returned by find_codiu_genes
.
test_codiu_genes(
data,
cluster_list,
shared_genes,
gene_tr_table,
id_table,
isoform_col = NULL,
parallel = TRUE,
t = 4
)
data |
A data.frame or tibble object including isoforms as rows and cells as columns. Isoform IDs can be included as row names (data.frame) or as an additional column (tibble). |
cluster_list |
A list of character vectors containing isoform IDs. Each element of the list represents a cluster of isoforms. |
shared_genes |
A two-row matrix containing n candidate co-DIU
gene pairs as column. Typically the result of running
|
gene_tr_table |
A data.frame or tibble object containing two columns
named |
id_table |
A data frame including two columns named |
isoform_col |
When a tibble is provided in |
parallel |
A logical. When |
t |
An integer indicating the number of threads to be used for
parallelization. This will be passed to the |
A set of potentially co-DIU genes will have at least two of their isoforms assigned to the same clusters, i.e. show detectable isoform-level co-expression. However, since clustering allows isoforms with slightly variable expression patterns to be clustered together, some isoforms might be assigned to clusters that do not faithfully represent their expression profile, leading to inaccuracies in co-DIU detection. To avoid false-positive co-DIU genes, the present function applies a regression model and a statistical test to each of the candidate pair of genes (hereby named gene 1 and gene 2), where at least two of the isoforms of each gene must belong to the same two clusters (hereby named cluster 1 and cluster 2).
Briefly, we need to assess whether expression values for the isoforms follow a correct co-DIU pattern, that is, the average profile across cell types of the two isoforms in cluster 1 must be significantly different to the average profile of the two isoforms in cluster 2, indicating distinct expression profiles for the two isoforms of each gene. In addition, the average profile of the two isoforms of gene 1 must not be different to the average profile of the two isoforms of gene 2, indicating that co-expression is only detectable when isoform-level expression is considered.
Internally, the function fits a generalized linear regression model (GLM) via
the glm
function, using the negative.binomial
function in the MASS
package to set the error distribution and link
function of the model via the family
argument. To test the
significance of the cluster*cell type
and gene*cell type interactions (as described above), we calculated
type-II analysis-of-variance (ANOVA) tables for the model using a
likelihood-ratio chi-square test using the Anova
function in
the car
package (given the unbalanced design).
A list containing one tibble
per tested gene pair, as generated
by make_test
. Each tibble will include two columns,
cluster:cell_type
and gene:cell_type
, containing the p-value
obtained when testing each of these interactions in the type-II
ANOVA test.
NOTE: In some cases the assumptions required for fitting the GLM are not met,
and an NA
value is returned instead. These are output to allow users
to control for untested gene pairs, but can easily be removed from the output.
Venables2002acorde
\insertRefFox2019acorde
For details, see internal functions:
make_design
, make_test
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.