knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

netZooR is an R package which consists of seven main algorithms and is able to construct, analyse and plot gene regulatory networks.

Installation

Prerequisites

Using this pacakage requires Python (3.X) and some Python libraries, R (>= 3.3.3), and stable Internet access.

Some plotting functions will require the Cytoscape installed.

Required Python libraries

How to install Python libraries depends varies from different platforms. More instructions could be find here.

The following Python libraries (or packages) are required by running PANDA and LIONESS algorithms:

The required Python packages are: pandas, numpy, networkx, matplotlib.pyplot.

Installing

This package could be downloaded via install_github() function from devtools package.

# install.packages("devtools") 
library(devtools)
# install netZooR pkg with vignettes, otherwise remove the "build_vignettes = TRUE" argument.
devtools::install_github("netZoo/netZooR", build_vignettes = TRUE)
library(viridisLite)#To visualize communities

Data Resources

Motif data

Here is some pre-prepared specie-sepcific PANDA-ready transcription factor binding motifs data stored in our AWS bucket https://s3.console.aws.amazon.com/s3/buckets/netzoo/netZooR/example_datasets/PANDA_ready_motif_prior/?region=us-east-2&tab=overview, which are derived from motif scan and motif info files located on https://sites.google.com/a/channing.harvard.edu/kimberlyglass/tools/resourcesby .

PPI

This package includes a function source.PPI may source a Protein-Protein Interactions (PPI) througt STRING database given a list of proteins of interest. The STRINGdb is already loaded while loading netZooR.

# TF is a data frame with single column filled with TFs of Mycobacterium tuberculosis H37Rv.
PPI <- source.PPI(TF, STRING.version="10", species.index=83332, score_threshold=0)

Running the sample TB datasets

library(netZooR)

Accessing the help pages for the usage of core functions.

?pandaPy
?createCondorObject
?pandaToCondorObject
?lionessPy
?alpaca
?pandaToAlpaca
?sambar

This package will invoke the Python in R environment through reticulate package. Configure which version of Python to use if necessary, here in netZooR, Python 3.X is required. More details can be found here

#check your Python configuration and the specific version of Python in use currently
py_config()

# reset to Python 3.X if necessary, like below:
use_python("/usr/local/bin/python3")

The previous command is necessary to bind R to Python since we are calling PANDA from Python because netZooPy has an optimized implementation of PANDA. Check this tutorial for an example using a pure R implementation of PANDA. Use example data sets within package to test this package. Refer to four input datasets files: one TB expression dataset control group , one TB expression dataset treated, one transcription factor binding motifs dataset, and one protein-protein interaction datasets from either inst/extdat or AWS.

retrieve the file path of these files came with the netZooR package.

# retrieve the file path of these files
treated_expression_file_path <- system.file("extdata", "expr4_matched.txt", package = "netZooR", mustWork = TRUE)
control_expression_file_path <- system.file("extdata", "expr10_matched.txt", package = "netZooR", mustWork = TRUE)
motif_file_path <- system.file("extdata", "chip_matched.txt", package = "netZooR", mustWork = TRUE)
ppi_file_path <- system.file("extdata", "ppi_matched.txt", package = "netZooR", mustWork = TRUE)

PANDA algorithm

Assign the above file paths to flag e(refers to "expression dataset"), m(refers to "motif dataset"), and ppi(refers to "PPI" dataset), respectively. Then set option rm_missing to TRUE to run PANDA to generate an aggregate network without unmatched TF and genes.

Repeat with control group.

treated_all_panda_result <- pandaPy(expr_file = treated_expression_file_path, motif_file = motif_file_path, ppi_file= ppi_file_path,modeProcess="legacy",  remove_missing = TRUE )
control_all_panda_result <- pandaPy(expr_file = control_expression_file_path,motif_file = motif_file_path, ppi_file= ppi_file_path,modeProcess="legacy",  remove_missing = TRUE )

Vector treated_all_panda_result and vector control_all_panda_result below are large lists with three elements: the entire PANDA network, indegree ("to" nodes) nodes and score, outdegree ("from" nodes) nodes and score. Use $panda,$indegree and $outdegree to access each list item resepctively.

Use $pandato access the entire PANDA network.

treated_net <- treated_all_panda_result$panda
control_net <- control_all_panda_result$panda

PANDA Cytoscape Plotting

Cytoscape is an interactivity network visualization tool highly recommanded to explore the PANDA network. Before using this function plot.panda.in.cytoscape, please install and launch Cytoscape (3.6.1 or greater) and keep it running whenever using.

# select top 1000 edges in PANDA network by edge weight.
panda.net <- head(treated_net[order(control_net$force,decreasing = TRUE),], 1000)

# run this function to create a network in Cytoscape.
vis.panda.in.cytoscape(panda.net, network.name="PANDA")

LIONESS Algorithm

How to run LIONESS is mostly idential with method how to run PANDA in this package, unless the return values of lionessPy() is a data frame where first two columns represent TFs (regulators) and Genes (targets) while the rest columns represent each sample. each cell filled with estimated score calculated by LIONESS.

# Run LIONESS algorithm for the first two samples
# removing start_sample and end_sample arguments to generate whole LIONESS network with all samples.
control_lioness_result <- lionessPy(expr_file = control_expression_file_path,motif_file = motif_file_path, ppi_file= ppi_file_path,modeProcess="legacy",  remove_missing = TRUE, start_sample=1, end_sample=2)

CONDOR Algorithm and plotting

PANDA network can simply be converted into condor.object by pandaToCondorObject(panda.net, threshold) Defaults option threshold is the average of [median weight of non-prior edges] and [median weight of prior edges], all weights mentioned previous are transformationed with formula w'=ln(e^w+1) before calculating the median and average. But all the edges selected will remain the orginal weights calculated by PANDA.

treated_condor_object <- pandaToCondorObject(treated_net, threshold = 0)

The communities structure can be plotted by igraph.

library(viridisLite)
treated_condor_object <-condorCluster(treated_condor_object,project = FALSE)
treated_color_num <- max(treated_condor_object$red.memb$com)
treated_color <- viridis(treated_color_num, alpha = 1, begin = 0, end = 1, direction = 1, option = "D")
condorPlotCommunities(treated_condor_object, color_list=treated_color, point.size=0.04, xlab="Genes", ylab="TFs")

ALPACA Algorithm

ALPACA community structure can also be generated from two PANDA network by pandaToAlpaca

alpaca<- pandaToAlpaca(treated_net, control_net, NULL, verbose=FALSE)

More tutorials

Browse with browseVignettes("netZooR")

Information

sessionInfo()

Note

If there is an error like Error in fetch(key) : lazy-load database.rdb' is corrupt when accessing the help pages of functions in this package after being loaded. It's a limitation of base R and has not been solved yet. Restart R session and re-load this package will help.



netZoo/netZooR documentation built on Oct. 16, 2024, 10:23 p.m.