Integrative random forest for gene regulatory network inference

Description

This function fits iRafNet, a flexible unified integrative algorithm that allows information from prior data, such as protein-protein interactions and gene knock-down, to be jointly considered for gene regulatory network inference. This function takes as input only one set of sampling scores, computed considering one prior data such as protein-protein interactions or gene expression from knock-out experiments. Note that some of the functions utilized are a modified version of functions contained in the R package randomForest (A. Liaw and M. Wiener, 2002).

Usage

1
iRafNet(X, W, ntree, mtry,genes.name)

Arguments

X

(n x p) Matrix containing expression levels for n samples and p genes.

W

(p x p) Matrix containing iRafNet sampling scores. Element (i,j) contains score for regulatory relationship (i -> j). Scores must be non-negative. Larger value of sampling score corresponds to higher likelihood of gene i regulating gene j. Columns and rows of W must be in the same order as the columns of X. Sampling scores W are computed considering one prior data such as protein-protein interactions or gene expression from knock-out experiments.

ntree

Numeric value: number of trees.

mtry

Numeric value: number of potential regulators to be sampled at each tree node.

genes.name

Vector containing gene names. The order needs to match the columns of X.

Value

Importance score for each regulatory relationship. The first column contains gene name of regulators, the second column contains gene name of targets, and third column contains corresponding importance scores.

References

Petralia, F., Wang, P., Yang, J., Tu, Z. (2015) Integrative random forest for gene regulatory network inference, Bioinformatics, 31, i197-i205.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2, 18–22.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  # --- Generate data sets
  n<-20                  # sample size
  p<-5                   # number of genes
  genes.name<-paste("G",seq(1,p),sep="")   # genes name
  data<-matrix(rnorm(p*n),n,p)      # generate expression matrix
  W<-abs(matrix(rnorm(p*p),p,p))    # generate weights for regulatory relationships
 
  # --- Standardize variables to mean 0 and variance 1
  data <- (apply(data, 2, function(x) { (x - mean(x)) / sd(x) } ))

  # --- Run iRafNet and obtain importance score of regulatory relationships
  out<-iRafNet(data,W,mtry=round(sqrt(p-1)),ntree=1000,genes.name)