network_inference: Regulatory importances estimation via Random Forests

View source: R/fct_network_inference.R

network_inferenceR Documentation

Regulatory importances estimation via Random Forests

Description

GENIE3 needs to be given a list of genes, that will be the nodes of the inferred network. Among those genes, some must be considered as potential regulators. GENIE3 can determine the influence if every regulators over each input genes, using their respective expression profiles. You can specify which conditions you want to be considered for those profiles during the network inference. For each target gene, the methods uses Random Forests to provide a ranking of all regulators based on their influence on the target expression. This ranking is then merged across all targets, giving a global regulatory links ranking stored in the result matrix.

Usage

network_inference(
  normalized.count,
  conds,
  regressors,
  targets,
  nTrees = 1000,
  nCores = ifelse(is.na(parallel::detectCores()), 1, max(parallel::detectCores() - 1, 1)),
  verbose = TRUE,
  importance_metric = "node_purity"
)

Arguments

normalized.count

normalized expression matrix containing the regressors and target genes in its rows, and samples a columns

conds

condition names to be used in the inference (not columns names, conditions names before the underscore)

regressors

genes to be taken as regressors during the inference procedures (regulator genes)

targets

genes to be included in the inferred network. Regressors can also be in the targets

nTrees

Number of trees by Random Forest

nCores

Number of CPU cores to use during the procedure. Default is the detected number of cores minus one.

verbose

If set to TRUE, a feedback on the progress of the calculations is given. Default: TRUE

importance_metric

character being either node_purity or MSEincrease_oob. This is the importance type computed for the regulator-gene pairs, as returned by the randomForest package. Default is node_purity, the metric used in GENIE3. Our improvement of the method uses MSEincrease_oob for consistency reasons regarding to statistical edges testing. The default one is around 4 times fatser, but more sensitive to the number of regulators and to over-fitting. Too few samples will lead to NA in MSEincrease_oob, so in that case, it is advised to used GENIE3's default one.

Value

Matrix filled with regulator-target regulatory weights

Examples

## Not run: 
data("abiotic_stresses")
data("regulators_per_organism")

aggregated_data <- aggregate_splice_variants(data = abiotic_stresses$normalized_counts)

genes <- get_locus(abiotic_stresses$heat_DEGs)
regressors <- intersect(genes, regulators_per_organism[["Arabidopsis thaliana"]])

mat <- network_inference(aggregated_data, conds = abiotic_stresses$conditions, 
targets = genes, regressors = regressors, nTrees = 1000, nCores = 4)

## End(Not run)

OceaneCsn/DIANE documentation built on Jan. 10, 2024, 6:43 p.m.