Cancers are not single disease entities, but comprising multiple molecularly distinct subtypes, and the heterogeneity prevents precise selection of patients for optimized therapy. Dissecting cancer subtype-specific signaling pathways is crucial to pinpointing dysregulated genes for the prioritization of novel therapeutic targets. Nested effects models (NEMs) are a group of graphical models that encode subset relations between observed downstream effects under perturbations to upstream signaling genes, providing a prototype for mapping the inner workings of the cell. In this study, we developed NEM-Tar, which extends the original NEMs to predict drug targets by incorporating causal information of (epi)genetic aberrations for signaling pathway inference. An information theory-based score, weighted information gain (WIG), was proposed to assess the impact of signaling genes on a specific downstream biological process. We will show in detail how to implement a toy example for singaling network inference as well as a real case study for prioritizing the potential therapeutic targets.
knitr::include_graphics("E:/Thesis/NEM_tar_Fig/Figure 2_0915.png")
Figure 1. Comparison between the structures of (A) classic NEM and (B) our proposed NEM-Tar
As shown in Figure 1A, in classic nested effects models, the S-genes are modeled as hidden variables, and their signaling interaction graph G (solid arrows) is the target to infer. In experiments with perturbations to individual S-genes, differential expression of downstream genes could be observed and considered as effect reporter genes (E-genes). Assuming that each E-gene is directly regulated by at most one S-gene in G, the maximum a posteriori attachment Θ (dashed arrows) of effect genes to S-genes could be computed. The goal is to search for the signaling graph G, which yields the most likely probabilistic nested effects. Illustrated in Figure 1B, an extra observational dimension (the real patients) is the factor that NEM-Tar should deal with.The necessary adjustment should be conducted on the design and inference strategies of classic NEM. However, the information needs to infer is also the hidden interaction between S-genes and the attachment relationship of E-genes to S-genes.And the likelihood function of NEM-Tar is similar to that of NEMs, except the state matrix of regulators (S-genes) S* in our model.
The applicaitons on real case studies of NEM-Tar require a lot of data preprocssing and integrative selection of singaling genes(S-genes) and effect reporter genes(E-genes).For the purpose of interpreting the main work flow and contribution of NEM-Tar.At first, we will introduce the employed in-silico data.
library(nemTar) library(dplyr) data("example")
The toy example contains four different elements.The S-gene number was applied in medium size(12), thus the dimension of Sgene_hidden, storing the unobserved S-gene states was 12\times12; Edata is the E-gene profiles with the dimension 804\times100, S_obs is the observed profile of S-gene states 'after' perturbations,with the dimension of 100\times12.
dim(Edata) dim(S_obs) dim(Sgene_hidden) para
control<-nem::set.default.parameters(Sgenes=rownames(Sgene_hidden),type="mLL",para=para) nemTar_rslt<-nem_Tar_greedy(Edata,Sgenes=control$Sgenes,S_obs,control=control)
This process takes a few seconds.
To visualize the inference results, an R package RedeR is suggested.However, the function plot.nem could give a more quickly visulization.The error matrix is the difference between the inferred S-gene matrix and the generated S-gene matrix, the result that all the entries is 0 indicates that the inference is perfect.
plot.nem(nemTar_rslt,what="graph") plot.nem(nemTar_rslt,what="graph",transitiveReduction=T) adj<-as(nemTar_rslt$graph, "matrix") error<-Sgene_hidden-adj print(error)
The profiles of GC contains three different elements.The S-gene number was determined as 14, thus the dimension of adjacency matrix of S-gene interacction network was 15\times15; Edata_GC_ori and D are the E-gene profiles before and after the transformation to binary variable,with the dimension of 1194\times38 and 824\times28, respectively; Sdata_GC is the observed profile of 'natural' perturbation states of Sgenes with the dimension of 28\times15.
# load the input profiles of GC library(nemTar) data("case_GC") data("EMT_list")
res.disc <- nem.discretize(Edata_GC_ori,neg.control=1:8,pos.control=9:10,nfold=2,cutoff= 0.5) D<-res.disc$dat para<-res.disc$para control<-set.default.parameters(Sgenes=Sgenes_GC,type="mLL",para=para) nemTar_GC<-nem_Tar_greedy(D=D,Sgenes=Sgenes_GC,S_pattern=Sdata_GC,control=control)
plot.nem(nemTar_GC,what="graph") plot.nem(nemTar_GC,what="graph",transitiveReduction=T)
knitr::include_graphics("E:/NEM_paper2/GC_EMT_net.png")
The statistical significance of the causal effect that the S-genes exert on downstream pathways(EMT pathway here) is quantified by permutation tests, i.e.,random sampling of E-genes with the same number of EMT signature genes in the regulon of a S-gene, and calculating the frequency of observing a same or higher WIG from the sampled E-gene sequences.(The value 0 of the adjusted P stands for the corresponding P value is less than the resolution of current sampling times,in the following case P<1e-05.)
EMT_post<-path_post(nemTar_GC,EMT_list,Sgenes_GC) WIG<-compute_WIG(EMT_post$post_affected,EMT_post$path_affected,15) sample_WIG0<-WIG_sample(EMT_post,nemTar_GC,EMT_list,15,1e05) sig_test<-WIGsig_test(sample_WIG0,WIG,1e05) ### results summary sig_rslt<-data.frame(S_genes=names(EMT_post$Sig_affected)[which(lengths(EMT_post$path_affected)!=0)], WIG=round(WIG$WIG,2),adjusted_P=format(round(sig_test,6), scientific = TRUE,digits = 3)) colnames(sig_rslt)<-c("S genes","WIG","Adjusted P") sig_rslt<-sig_rslt[order(sig_rslt$WIG,decreasing=TRUE),] knitr::kable(sig_rslt,align="l",row.names=F,caption="Assessment of single S-gene perturbation on EMT in GC")
Also, the weigheted information gain(WIG) of combinational perturbation of 2 S-genes could be calcualted.The kinases of double perturbation with higher WIGs could be the candidate combinational therapeutic targets.
Sgene_double_WIG<-WIG_double(EMT_post,15) ### results summary sig_rslt<-data.frame(S_genes=names(Sgene_double_WIG$WIG), WIG=round(Sgene_double_WIG$WIG,2)) colnames(sig_rslt)<-c("S genes","WIG") sig_rslt<-sig_rslt[order(sig_rslt$WIG,decreasing=TRUE),] knitr::kable(sig_rslt[match(c("CDH1/ERBB2","KRAS/CDH1","BRAF/CDH1","KRAS/ERBB2", "BRAF/ERBB2","KRAS/BRAF"),rownames(sig_rslt)),], align="l",row.names=F,caption="Assessment of double perturbations (kinase only) on EMT in GC")
The profiles of GC contains three different elements.The S-gene number was prioritized as 15, thus the dimension of adjacency matrix of S-gene interacction network was 13\times13; Edata_GC_ori and D are the E-gene profiles before and after the transformation to binary variable,with the dimension of 1337\times96 and 1323\times50,respectively; Sdata_CRC is the observed profile of 'natural' perturbation states of S-genes with the dimension 50\times13.
library(nemTar) data("case_CRC") data("EMT_list")
res.disc <- nem.discretize(Edata_CRC_ori,neg.control=1:30,pos.control=31:46,nfold=2,cutoff=0.6) D<-res.disc$dat para<-res.disc$para control<-set.default.parameters(Sgenes=Sgenes_CRC,type="mLL",para=para) nemTar_CRC<-nem_Tar_greedy(D,Sgenes=control$Sgenes,Sdata_CRC,control=control)
plot.nem(nemTar_CRC,what="graph") plot.nem(nemTar_CRC,what="graph",transitiveReduction=T)
knitr::include_graphics("E:/NEM_paper2/CRC_CMS4_net.png")
The statistical significance of the causal effect that the S-genes exert on downstream pathways(EMT pathway here) is quantified by permutation tests, i.e.,random sampling of E-genes with the same number of EMT signature genes in the regulon of a S-gene, and calculating the frequency of observing a same or higher WIG from the sampled E-gene sequences.(The value 0 of the adjusted P stands for the corresponding P value is less than the resolution of current sampling times,in the following case P<1e-05.)
EMT_post<-path_post(nemTar_CRC,EMT_list,Sgenes_CRC) WIG<-compute_WIG(EMT_post$post_affected,EMT_post$path_affected,13) sample_WIG0<-WIG_sample(EMT_post,nemTar_CRC,EMT_list,13,1e05) sig_test<-WIGsig_test(sample_WIG0,WIG,1e05) ### results summary sig_rslt<-data.frame(S_genes=names(EMT_post$Sig_affected)[which(lengths(EMT_post$path_affected)!=0)], WIG=round(WIG$WIG,2),adjusted_P=format(round(sig_test,6), scientific = TRUE,digits = 3)) colnames(sig_rslt)<-c("Sgenes","WIG","Adjusted P") # sig_rslt[which(sig_rslt[3]==0),3]<-"<1e-05" sig_rslt<-sig_rslt[order(sig_rslt$WIG,decreasing=TRUE),] knitr::kable(sig_rslt,align="l",row.names=F,caption="Assessment of the impact of single S-gene perturbation on EMT in CRC")
Similar to study in GC, the weigheted information gain(WIG) of combinational perturbation of 2 Sgenes could be calcualted.The kinases of double perturbation with higher WIGs could be the candidate combinational therapeutic targets.
Sgene_double_WIG<-WIG_double(EMT_post,13) ### results summary sig_rslt<-data.frame(S_genes=names(Sgene_double_WIG$WIG), WIG=round(Sgene_double_WIG$WIG,2)) colnames(sig_rslt)<-c("S genes","WIG") sig_rslt<-sig_rslt[order(sig_rslt$WIG,decreasing=TRUE),] knitr::kable(sig_rslt[match(c("KRAS/CTNNB1","KRAS/TGFBR2","KRAS/ERBB4","KRAS/BRAF", "KRAS/PIK3CA","BRAF/CTNNB1","PIK3CA/CTNNB1","TGFBR2/CTNNB1", "ERBB4/CTNNB1","TGFBR2/ERBB4"),rownames(sig_rslt)),], align="l",row.names=F,caption="Assessment of double perturbations (kinase only) on EMT in CRC")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.