RGBM | R Documentation |
This function performs the proposed regularized gradient boosting machines for reverse engineering GRN. It allows the user to provide prior information in the form of a mechanistic network g_M and after generation of an initially inferred GRN using the core GBM model undergoes a pruning step. Here we detect and remove isolated nodes using the select_ideal_k
function along with identification of the optimal set of transcription factors for each target gene. We then re-iterate through the GBM followed by the refinement step to generate the final re-constructed GRN.
RGBM(E = matrix(rnorm(100), 10, 10), K = matrix(0, nrow(E), ncol(E)),
g_M = matrix(1, 10, 10), tfs = paste0("G", c(1:10)),
targets = paste0("G", c(1:10)), lf = 1, M = 5000, nu = 0.001, s_f = 0.3,
no_iterations = 2, mink = 0, experimentid = 1, outputpath= "DEFAULT",
sample_type = "Exp1_", real = 0)
E |
N-by-p expression matrix. Columns correspond to genes, rows correspond to experiments. E is expected to be already normalized using standard methods, for example RMA. Colnames of E is the set of all p genes and Ntfs represents the number of transcription factors and Ntargets represents the number of target genes. |
K |
N-by-p initial perturbation matrix. It directly corresponds to E matrix, e.g. if K[i,j] is equal to 1, it means that gene j was knocked-out in experiment i. Single gene knock-out experiments are rows of K with only one value 1. Colnames of K is set to be the set of all genes. By default it's a matrix of zeros of the same size as E, e.g. unknown initial perturbation state of genes. |
g_M |
Initial mechanistic network in the form of an adajcency matrix (Ntf-by-Ntargets). Here each column is a binary vector where only those elements are 1 when the corresponding transcription factor has a connection with that target gene. Colnames of g_M should be same as names of targets and Rownames of g_M should be same as names of Tfs. By default it's a matrix of ones of size Ntfs x Ntargets. |
tfs |
List of names of transcription factors |
targets |
List of names of target genes |
lf |
Loss Function: 1 -> Least Squares and 2 -> Least Absolute Deviation |
M |
Number of extensions in boosting model, e.g. number of iterations of the main loop of RGBM algorithm. By default it's 5000. |
nu |
Shrinkage factor, learning rate, 0<nu<=1. Each extension to boosting model will be multiplied by the learning rate. By default it's 0.001. |
s_f |
Sampling rate of transcription factors, 0<s_f<=1. Fraction of transcription factors from E, as indicated by |
no_iterations |
Number of times initial GRN to be constructed and then averaged to generate smooth edge weights for the initial GRN as shown in |
mink |
specified threshold i.e. the minimum number of Tfs to be considered while optimizing the L-curve criterion. By default it's 0. |
experimentid |
The id of the experiment being conducted. It takes natural numbers like 1,2,3 etc. By default it's 1. |
outputpath |
Location where intermediate Adjacency_Matrix and Images folder will be created. By default it's a temp directory (e.g. /tmp/Rtmp...) |
sample_type |
String arguement representing a label for the experiment i.e. in case of DREAM3 challenge sample_type="DREAM3". |
real |
Numeric value 0 or 1 corresponding to simulated or real experiment respectively. |
Returns the final inferred GRN of form Ntfs-by-Ntargets adjacency matrix.
Raghvendra Mall <rmall@hbku.edu.qa>
select_ideal_k
, first_GBM_step
# load RGBM library
library("RGBM")
# this step is optional, it helps speed up calculations, run in parallel on 2 processors
library(doParallel)
cl <- makeCluster(2)
# run network inference on a 100-by-100 dummy expression data.
A = RGBM()
stopCluster(cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.