GBM: Calculate Gene Regulatory Network from Expression data using...

GBMR Documentation

Calculate Gene Regulatory Network from Expression data using either LS-TreeBoost or LAD-TreeBoost

Description

This function calculates a Ntfs-by-Ntargets adjacency matrix A from N-by-p expression matrix E. E is expected to be given as input. E is assumed to have p columns corresponding to all the genes, Ntfs represents the number of transcription factors and Ntargets represents the number of target genes and N rows corresponding to different experiments. Additionally, GBM function takes matrix of initial perturbations of genes K of the same size as E, and other parameters including which loss function to use (LS = 1, LAD = 2). As a result, GBM returns a squared matrix A of edge confidences of size Ntfs-by-Ntargets. A subset of known transcription factors can be defined as a subset of all p genes.

Usage

GBM(E = matrix(rnorm(100), 10, 10), K = matrix(0, nrow(E), ncol(E)), 
     tfs = paste0("G",c(1:10)), targets = paste0("G",c(1:10)), 
     s_s = 1, s_f = 0.3, lf = 1, 
     M = 5000,nu = 0.001, scale = TRUE,center = TRUE, optimization.stage = 2)

Arguments

E

N-by-p expression matrix. Columns correspond to genes, rows correspond to experiments. E is expected to be already normalized using standard methods, for example RMA. Colnames of E is the set of all genes.

K

N-by-p initial perturbation matrix. It directly corresponds to E matrix, e.g. if K[i,j] is equal to 1, it means that gene j was knocked-out in experiment i. Single gene knock-out experiments are rows of K with only one value 1. Colnames of K is set to be the set of all genes. By default it's a matrix of zeros of the same size as E, e.g. unknown initial perturbation state of genes.

tfs

List of names of transcription factors

targets

List of names of target genes

s_s

Sampling rate of experiments, 0<s_s<=1. Fraction of rows of E, which will be sampled with replacement to calculate each extension in boosting model. By default it's 1.

s_f

Sampling rate of transcription factors, 0<s_f<=1. Fraction of transcription factors from E, as indicated by tfs vector, which will be sampled without replacement to calculate each extesion in boosting model. By default it's 0.3.

lf

Loss function: 1 -> Least Squares, 2 -> Least Absolute deviation

M

Number of extensions in boosting model, e.g. number of iterations of the main loop of RGBM algorithm. By default it's 5000.

nu

Shrinkage factor, learning rate, 0<nu<=1. Each extension to boosting model will be multiplied by the learning rate. By default it's 0.001.

scale

Logical flag indicating if each column of E should be scaled to be unit standard deviation. By default it's TRUE.

center

Logical flag indicating if each column of E should be scaled to be zero mean. By default it's TRUE.

optimization.stage

Numerical flag indicating if re-evaluation of edge confidences should be applied after calculating initial V, optimization.stage={0,1,2}. If optimization.stage=0, no re-evaluation will be applied. If optimization.stage=1, variance-based optimization will be applied. If optimization.stage=2, variance-based and z-score based optimizations will be applied.

Value

A

Gene Regulatory Network in form of a Ntfs-by-Ntargets adjacency matrix.

Author(s)

Raghvendra Mall <rmall@hbku.edu.qa>

See Also

GBM.train, GBM.test, v2l

Examples

# load RGBM library
library("RGBM")
# this step is optional, it helps speed up calculations, run in parallel on 2 processors
library(doParallel)
cl <- makeCluster(2)
# run network inference on a 100-by-100 dummy expression data.
V = GBM()
stopCluster(cl)

RGBM documentation built on Aug. 11, 2022, 5:10 p.m.

Related to GBM in RGBM...