estimate_test_edges_time: Estimates time spent for statistical testing

View source: R/fct_edge_testing.R

estimate_test_edges_timeR Documentation

Estimates time spent for statistical testing

Description

Estimates to running time for test_edges function, depending on its arguments. This is useful as the test_edges function can be quite long to complete.

Usage

estimate_test_edges_time(
  mat,
  normalized_counts,
  nGenes,
  nRegulators,
  density = 0.02,
  nTrees = 1000,
  nShuffle = 1000,
  nCores = ifelse(is.na(parallel::detectCores()), 1, max(parallel::detectCores() - 1, 1)),
  verbose = TRUE
)

Arguments

mat

matrix containing the importance values for each target and regulator (preferably computed with GENIE3 and the OOB importance metric)

normalized_counts

normalized expression data containing the genes present in mat argument, and such as used for the first network inference step.

nGenes

number of total genes in the network, union of thetarget genes, and regulators

nRegulators

number of regulators used for the network inference step

density

approximate desired density, that will be used to build a first network, which edges are the one to be statistically tested. Default is 0.02. Biological networks are known to have densities (ratio of edges over total possible edges in the graph) between 0.1 and 0.001. The number of genes and regulators are needed to compute the density.

nTrees

number of trees used for random forest importance computations

nShuffle

number of times the response variable (target gene expression) is randomized in order to estimate the null distribution of the predictive variables (regulators) importances.

nCores

Number of CPU cores to use during the procedure. Default is the detected number of cores minus one.

verbose

If set to TRUE, a feedback on the progress of the calculations is given. Default: TRUE

Value

time in seconds

  • links: a dataframe containing the links of the network before testing, as built from the user defined prior density. All edges are associated to their pvalue and fdr-adjusted pvalue.

  • fdr_nEdges_curve : relation between the fdr threshold, and the final number of edges in the final network

Examples

## Not run: 
data("abiotic_stresses")
data("gene_annotations")
data("regulators_per_organism")

genes <- get_locus(abiotic_stresses$heat_DEGs)
regressors <- intersect(genes, 
                        regulators_per_organism$`Arabidopsis thaliana`)

data <- aggregate_splice_variants(abiotic_stresses$normalized_counts)

r <- DIANE::group_regressors(data, genes, regressors)

mat <- DIANE::network_inference(r$counts, 
                                conds = abiotic_stresses$conditions, 
                                targets = r$grouped_genes,
                                regressors = r$grouped_regressors, 
                                importance_metric = "MSEincrease_oob", 
                                verbose = TRUE) 
res <- DIANE::estimate_test_edges_time(mat, normalized_counts = r$counts, density = 0.02,
                        nGenes = length(r$grouped_genes), 
                        nRegulators = length(r$grouped_regressors), 
                        nTrees = 1000, verbose = TRUE)

## End(Not run)

OceaneCsn/DIANE documentation built on Jan. 10, 2024, 6:43 p.m.