test_codiu_genes: Statistical testing of candidate co-DIU genes
In ConesaLab/acorde: Isoform co-usage networks from single-cell RNA-seq data

test_codiu_genes

R Documentation

Statistical testing of candidate co-DIU genes

Description

Pairwise statistical testing of co-Differential Isoform Usage relationships. For a selected set of gene pairs showing co-expression of isoforms across clusters (see find_codiu_genes), this function tests the significance of the detected co-DIU patterns.

Warning: this function may take a long time to run, especially if applied to all pairs of co-DIU genes returned by find_codiu_genes.

Usage

test_codiu_genes(
  data,
  cluster_list,
  shared_genes,
  gene_tr_table,
  id_table,
  isoform_col = NULL,
  parallel = TRUE,
  t = 4
)

Arguments

`data`	A data.frame or tibble object including isoforms as rows and cells as columns. Isoform IDs can be included as row names (data.frame) or as an additional column (tibble).
`cluster_list`	A list of character vectors containing isoform IDs. Each element of the list represents a cluster of isoforms.
`shared_genes`	A two-row matrix containing n candidate co-DIU gene pairs as column. Typically the result of running `find_codiu_genes`.
`gene_tr_table`	A data.frame or tibble object containing two columns named `transcript_id` and `gene_id`, indicating gene-isoform correspondence.
`id_table`	A data frame including two columns named `cell` and `cell_type`, in which correspondence between cell ID and cell type should be provided. The number of rows should be equal to the total number of cell columns in `data`, and the order of the `cell` column should match column (i.e. cell) order in `data`.
`isoform_col`	When a tibble is provided in `data`, a character object indicating the name of the column where isoform IDs are specified. Otherwise, isoform identifiers will be assumed to be defined as rownames, and this argument will not need to be provided.
`parallel`	A logical. When `TRUE`, parallelization is enabled. The `future_map_lgl` function in the `furrr` is used.
`t`	An integer indicating the number of threads to be used for parallelization. This will be passed to the `plan` function from the `future` package via the `workers` argument.

Details

A set of potentially co-DIU genes will have at least two of their isoforms assigned to the same clusters, i.e. show detectable isoform-level co-expression. However, since clustering allows isoforms with slightly variable expression patterns to be clustered together, some isoforms might be assigned to clusters that do not faithfully represent their expression profile, leading to inaccuracies in co-DIU detection. To avoid false-positive co-DIU genes, the present function applies a regression model and a statistical test to each of the candidate pair of genes (hereby named gene 1 and gene 2), where at least two of the isoforms of each gene must belong to the same two clusters (hereby named cluster 1 and cluster 2).

Briefly, we need to assess whether expression values for the isoforms follow a correct co-DIU pattern, that is, the average profile across cell types of the two isoforms in cluster 1 must be significantly different to the average profile of the two isoforms in cluster 2, indicating distinct expression profiles for the two isoforms of each gene. In addition, the average profile of the two isoforms of gene 1 must not be different to the average profile of the two isoforms of gene 2, indicating that co-expression is only detectable when isoform-level expression is considered.

Internally, the function fits a generalized linear regression model (GLM) via the glm function, using the negative.binomial function in the MASS package to set the error distribution and link function of the model via the family argument. To test the significance of the cluster*cell type and gene*cell type interactions (as described above), we calculated type-II analysis-of-variance (ANOVA) tables for the model using a likelihood-ratio chi-square test using the Anova function in the car package (given the unbalanced design).

Value

A list containing one tibble per tested gene pair, as generated by make_test. Each tibble will include two columns, cluster:cell_type and gene:cell_type, containing the p-value obtained when testing each of these interactions in the type-II ANOVA test.

NOTE: In some cases the assumptions required for fitting the GLM are not met, and an NA value is returned instead. These are output to allow users to control for untested gene pairs, but can easily be removed from the output.

References

\insertRef

Venables2002acorde

\insertRef

Fox2019acorde

ConesaLab/acorde
Isoform co-usage networks from single-cell RNA-seq data

test_codiu_genes: Statistical testing of candidate co-DIU genes
In ConesaLab/acorde: Isoform co-usage networks from single-cell RNA-seq data

Statistical testing of candidate co-DIU genes

Description

Usage

Arguments

Details

Value

References

See Also

Related to test_codiu_genes in ConesaLab/acorde...

R Package Documentation

Browse R Packages

We want your feedback!

ConesaLab/acorde Isoform co-usage networks from single-cell RNA-seq data

test_codiu_genes: Statistical testing of candidate co-DIU genes In ConesaLab/acorde: Isoform co-usage networks from single-cell RNA-seq data

Statistical testing of candidate co-DIU genes

Description

Usage

Arguments

Details

Value

References

See Also

Related to test_codiu_genes in ConesaLab/acorde...

R Package Documentation

Browse R Packages

We want your feedback!

ConesaLab/acorde
Isoform co-usage networks from single-cell RNA-seq data

test_codiu_genes: Statistical testing of candidate co-DIU genes
In ConesaLab/acorde: Isoform co-usage networks from single-cell RNA-seq data