run_csd: Run CSD analysis

View source: R/find_rho_and_var.R

run_csdR Documentation

Run CSD analysis

Description

This function implements the CSD algorithm based on the one presented by Voigt et al. 2017. All pairs of genes are first compared within each condition by the Spearman correlation and the correlation and its variance are estimated by bootstrapping. Finally, the results for the two conditions are compared and C-, S- and D-values are computed and returned.

Usage

run_csd(
  x_1,
  x_2,
  n_it = 20L,
  nThreads = 1L,
  verbose = TRUE,
  iterations_gap = 1L
)

Arguments

x_1

Numeric matrix, the gene expression matrix for the first condition. Genes are in columns, samples are in rows. The columns must be named with the name of the genes. Missing values are not allowed.

x_2

Numeric matrix, the gene expression matrix for the second condition.

n_it

Integer, number of bootstrap iterations

nThreads

Integer, number of threads to use for computations

verbose

Logical, should progress be printed?

iterations_gap

If output is verbose - Number of iterations between each status message (Default=1 - Displayed only if verbose=TRUE)

Details

The gene names in x_1 and x_2 do not need to be in the same order, but must be in the same namespace. Only genes present in both datasets will be considered for the analysis. The parallelism gained by nThreads applies to the computations within a single iteration. The iterations are run is serial in order to reduce the memory footprint.

Value

A data.frame with the additional class attribute csd_res with the results of the CSD analysis. This frame has a row for each pair of genes and has the following columns:

Gene1

Character, the name of the first gene

Gene2

Character, the name of the second gene

rho1

Mean correlation of the two genes in the first condition

rho2

Mean correlation of the two genes in the second condition

var1

The estimated variance of rho1 determined by bootstrapping

var2

The estimated variance of rho2 determined by bootstrapping

cVal

Numeric, the conserved score. A high value indicates that the co-expression of the two genes have the same sign in both conditions

sVal

Numeric, the specific score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in one condition, but not the other.

dVal

Numeric, the differentiated score. A high value indicates that the co-expression of the two genes have a high degree of co-expression in both condition, but the sign of co-expression is different.

References

Voigt A, Nowick K and Almaas E 'A composite network of conserved and tissue specific gene interactions reveals possible genetic interactions in glioma' In: PLOS Computational Biology 13(9): e1005739. (doi: https://doi.org/10.1371/journal.pcbi.1005739)

Examples

data("sick_expression")
data("normal_expression")
cor_res <- run_csd(
    x_1 = sick_expression, x_2 = normal_expression,
    n_it = 100, nThreads = 2L
)
c_max <- max(cor_res$cVal)

AlmaasLab/csdR documentation built on Nov. 9, 2024, 3:09 p.m.