# fssemR: Fused Sparse Structural Equation Models to Jointly Infer Gene Regulatory Networks In fssemR: Fused Sparse Structural Equation Models to Jointly Infer Gene Regulatory Network

## Estimating GRNs from the simulated gene expression data and genetic perturbation data

We need to input the gene expression and corresponding genotype data of two conditions into the FSSEM algorithm. They are stored in the data$Data. 1. 20 simulated gene expression under two conditions head(data$Data$Y[[1]]) head(data$Data$Y[[2]])  1. 60 corresponding cis-eQTLs' genotype under two conditions head(data$Data$X[[1]] - 1) head(data$Data$X[[2]] - 1)  1. data$Data$Sk stores each gene's cis-eQTL's indices. In real data application, we recommend to use package MatrixEQTL to search the significant cis-eQTLs for genes of interested and build Sk for your research head(data$Data$Sk)  ### Initialization of fssemR by ridge regression We implement our fssemR by the observed gene expression data and genetic perturbations data that stored in data$Data, and it is initialized by ridge regression, the $l_2$ norm penalty's hyperparameter $\gamma$ is selected by 5-fold cross-validation.

Xs  = data$Data$X     ## eQTL's genotype data
Ys  = data$Data$Y     ## gene expression data
Sk  = data$Data$Sk    ## cis-eQTL indices
gamma = cv.multiRegression(Xs, Ys, Sk, ngamma = 50, nfold = 5, n = data$Vars$n,
p = data$Vars$p, k = data$Vars$k)
fit0   = multiRegression(data$Data$X, data$Data$Y, data$Data$Sk, gamma, trans = FALSE,
n = data$Vars$n, p = data$Vars$p, k = data$Vars$k)


### Run fssemR algorithm for data

Then, we chose the fit0 object from ridge regression as intialization, and implement the fssemR algorithm, BIC is used to select optimal hyperparameters $\lambda, \rho$, where nlambda is the number of candidate lambda values for $l_1$ regularized term, and nrho is the number of candidate rho values for fused lasso regularized term.

fitOpt <- opt.multiFSSEMiPALM2(Xs = Xs, Ys = Ys, Bs = fit0$Bs, Fs = fit0$Fs, Sk = Sk,
sigma2 = fit0$sigma2, nlambda = 10, nrho = 10, p = data$Vars$p, q = data$Vars$k, wt = TRUE) fit <- fitOpt$fit


### Comparing our estimated GRNs and differential GRN with ground truth

cat("Power of two estimated GRNs = ",
(TPR(fit$Bs[[1]], data$Vars$B[[1]]) + TPR(fit$Bs[[2]], data$Vars$B[[2]])) / 2)
cat("FDR of two estimated GRNs = ",
(FDR(fit$Bs[[1]], data$Vars$B[[1]]) + FDR(fit$Bs[[2]], data$Vars$B[[2]])) / 2)
cat("Power of estimated differential GRN = ",
TPR(fit$Bs[[1]] - fit$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]))
cat("FDR of estimated differential GRN = ",
FDR(fit$Bs[[1]] - fit$Bs[[2]], data$Vars$B[[1]] - data$Vars$B[[2]]))


From these 4 metrics, we can get the performance of our fssemR algorithm comparing to the ground truth (if we know)

## Differential GRN Visualization

# data$Vars$B[[2]]    ## simulated GRN under condition 2
diffGRN = network(t(fit$Bs[[2]] - fit$Bs[[1]]) != 0, matrix.type = "adjacency", directed = TRUE)
# up-regulated edges are colored by red and down-regulated edges are colored by blue
ecol = 3 - sign(t(fit$Bs[[2]] - fit$Bs[[1]]))
plot(diffGRN, displaylabels = TRUE, label = network.vertex.names(GRN_2), label.cex = 0.5, edge.col = ecol)


Additionally, the differeitial effect of two GRN are also estimated. Therefore, we can tell how the interactions in two GRNs change.

diffGRN = Matrix::Matrix(fit$Bs[[1]] - fit$Bs[[2]], sparse = TRUE)
diffGRN


From the diffGRN, we can determined how the gene-gene interactions in GRN changes across two conditions, then, we can find out the key genes for condition-specific gene regulatory network.

Additionally, for more applications and the replications of our real data analysis, please go to the https://github.com/Ivis4ml/fssemR/tree/master/inst for more cases.

## Session Information

sessionInfo()


## Try the fssemR package in your browser

Any scripts or data that you put into this service are public.

fssemR documentation built on Dec. 4, 2019, 5:06 p.m.