require(knitr) knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, fig.width = 7, fig.height = 6, fig.align = "center")
NetAct is a computational platform that uses both transcriptomic data and literature-based transcription factor-target databases to construct core transcription-factor regulatory networks. This tutorial aims to demonstrate the core functional components of NetAct and how one uses it to construct and model a transcription-factor regulatory network. The experimental RNA-seq data utilized in this workflow is from Sheridan et al. (2015) and consists of three cell populations (basal, luminal progenitor (LP) and mature luminal (ML)) sorted from the mammary glands of female virgin mice. Using the GEO Series accession number GSE63310, count data for these samples can be downloaded from the Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/.
The analysis begins by downloading text files that contain raw gene-level counts for each sample. The file GSE63310_RAW.tar is available online at http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63310&format=file.
library(NetAct) url <- "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE63310&format=file" utils::download.file(url, destfile="GSE63310_RAW.tar", mode="wb") utils::untar("GSE63310_RAW.tar", exdir = ".") files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt", "GSM1545538_purep53.txt", "GSM1545539_JMS8-2.txt", "GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt", "GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt", "GSM1545545_JMS9-P8c.txt") for(i in paste(files, ".gz", sep="")){ R.utils::gunzip(i, overwrite=TRUE) } files <- c("GSM1545535_10_6_5_11.txt", "GSM1545536_9_6_5_11.txt", "GSM1545538_purep53.txt", "GSM1545539_JMS8-2.txt", "GSM1545540_JMS8-3.txt", "GSM1545541_JMS8-4.txt", "GSM1545542_JMS8-5.txt", "GSM1545544_JMS9-P7c.txt", "GSM1545545_JMS9-P8c.txt") x <- edgeR::readDGE(files, columns=c(1,3)) group <- as.factor(c("LP", "ML", "Basal", "Basal", "ML", "LP", "Basal", "ML", "LP")) x$samples$group <- group samplenames <- c("LP1", "ML1", "Basal1", "Basal2", "ML2", "LP2", "Basal3", "ML3", "LP3") colnames(x) <- samplenames
Before proceeding, it is important to define all the group comparisons you are interested in as well as an R dataframe that contains all the relevant phenotype metadata including the group names.
compList <- c("Basal-LP", "Basal-ML", "LP-ML") phenoData = new("AnnotatedDataFrame", data = data.frame(celltype = group)) rownames(phenoData) = colnames(x$counts)
Fro RNA-seq data, NetAct provides a useful pre-processing function called preprocess_counts()
that filters out lowly expressed genes and retrieves associated gene symbols for raw count data. Differentially expressed genes (DEGs) are then identified using the limma package. The resultant normalized expression data is saved alongside the phenoData metadata into the standard ExpressionSet
object for downstream analysis.
counts <- Preprocess_counts(counts = x$counts, groups = group, mouse = TRUE) DErslt = RNAseqDegs_limma(counts = counts, phenodata = phenoData, complist = compList, qval = 0.05) neweset = Biobase::ExpressionSet(assayData = as.matrix(DErslt$Overall$e), phenoData = phenoData)
Alternatively, the users can choose to use DESeq2 for the DEG analysis.
DErslt2 = RNAseqDegs_DESeq(counts = counts, phenodata = phenoData, complist = compList, qval = 0.05) neweset2 = Biobase::ExpressionSet(assayData = as.matrix(DErslt2$Overall$e), phenoData = phenoData)
We adopt a permutation approach to select the significantly enriched TFs by using the GSEA algorithm. The TFs are aggregated from the multiple comparisons by considering the q-value cutoff. This step usually takes the longest and thus we recommend saving the output by providing a file name. The results from this step in the tutorial can be retried from the vignettes folder on our GitHub.
data("mDB") calc <- FALSE if (calc) { gsearslts <- TF_Selection(GSDB = mDB, DErslt = DErslt, minSize = 5, nperm = 10000, qval = 0.05, compList = compList, nameFile = "gsearslts_tutorial") } else { gsearslts <- readRDS(file = "gsearslts_tutorial.RDS") } tfs <- gsearslts$tfs tfs
In the cases where comparison specific TFs are needed or when comparisons require varying q-value cutoffs, the Reselect_TFs()
function comes in handy. The argument qval can be a vector of q-value cutoffs for different comparisons.
Reselect_TFs(GSEArslt = gsearslts$GSEArslt, qval = 0.01)
NetAct infers the activities of the selected regulators based on the expression of their targets. The results show clearer TF activity patterns than those from heir gene expression.
act.me <- TF_Activity(tfs, mDB, neweset, DErslt$Overall) acts_mat = act.me$all_activities Activity_heatmap(acts_mat, neweset)
In this step we output the gene regulatory network topology by filtering the link relations based on mutual information and entropy. The plot_network()
function provides an interactive view of the network topology. The image of the network "tutorial_network.png" is provided in the vignettes folder.
tf_links = TF_Filter(acts_mat, mDB, miTh = .05, nbins = 8, corMethod = "spearman", DPI = T) plot_network(tf_links)
We then use the algorithm RACIPE to model the dynamical behavior of the network we constructed. It is recommended to simulate 10,000 models.
racipe_results <- sRACIPE::sracipeSimulate(circuit = tf_links, numModels = 200, plots = TRUE)
Su K, et al (2022) NetAct: a computational platform to construct core transcription factor regulatory networks using gene activity, Genome Biology, 23:270. https://doi.org/10.1186/s13059-022-02835-3
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.