knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Esearch3D introduction with dummy example

Welcome to the first vignette of our software for the prediction of the activity of integenic fragments. This vignette introduces to the workflow of Esearch3D. It goes through all the main operations of the software with dummy data. The only part that this vignette does not present is the classification of enhancers with machine learning. The latter topic is kept as last in order that only advance users can access to it. We hope that providing this vignette can help to understand the data, the operations and the results of our software.

Enviroment set up

Let us clean the R enviroment, set the working directory, load the software package, set the random seed generator for reproducing always the same results and define the number of cores available in the computer to run the operations. Be careful: set up the number of cores based on your resources, if you are not secure how to, then just set equal to 2

#Clean workspace and memory ----
rm(list=ls())
gc()

#Set working directory ----
gps0=getwd()
gps0=paste(gps0,"/%s",sep="")
rootDir=gps0
setwd(gsub("%s","",rootDir))

#Load libraries ----
suppressWarnings(suppressMessages(
  library("Esearch3D", quietly = T)
  )
)

#Set variables ----
#Set seed to get always the same results out of this vignette
set.seed(8)
#Set number of cores to parallelize the tasks
n_cores=5

Data set up

Esearch3D requires a chromatin interaction network (CIN) composed by genes, genic fragments and non-genic fragments. The network must be divded into two components: one with only gene-genic fragments interactions and one with only interactions between fragments. The two components can be provided as: igraph objects, edge lists or adjacency matrices.

#Load and set up the example data ----
data("dummy_data_l")
#gene - fragment interaction network
gf_net=dummy_data_l$gf_net;head(gf_net);
#gene-fragment-fragment interaction network
ff_net=dummy_data_l$ff_net;head(ff_net);

Esearch3D requires a cell's expression profile. Specifically, a numeric matrix of the cell's profile, one column per cell, rows are genes, the value of a gene should be its expression value in a column cell.

#cell profile with starting values of genes
input_m=dummy_data_l$input_m;head(input_m)

Extra data set up

Esearch3D can take the length of the chromosomes of the organism in study and the annotation of the genes in the cell's profile. These data can be provided but are not necessar to get the Imputed Activity Scores They are only used to visualize metainformation with the propagation results inside the graphical user interface (GUI).

#length of chromosomes
chr_len=dummy_data_l$chr_len;head(chr_len);
#gene annotation
ann_net_b=dummy_data_l$ann_net_b;head(ann_net_b);

Multi-gene two-step propagation

The expression of the genes is propagated from their corresponding nodes into only the genic fragments. This facilitates the unbiased assignment of gene expression to nodes containing multiple promoters (i.e., genes that map to multiple nodes). Be carefull: r is the isolation parameter, use low value for first step, use high value for second step

#Two step propagation -----
#Propagation over the gene-fragment network
gf_prop=rwr_OVprop(g=gf_net,input_m = input_m, no_cores=n_cores, r=0.1);head(gf_prop)

The gene expression is then propagated to the rest of the CIN. The genic and intergenic fragments receive an imputed activity score (IAS) reflecting the likelihood of enhancer activity.

#Propagation over the gene-fragment-fragment network
ff_prop=rwr_OVprop(g=ff_net,input_m = gf_prop, no_cores=n_cores, r=0.8);head(ff_prop)

Graphical user interface

The software provides a GUI to visualize the IAS of a node with respect its neighbourhood in the CIN First, it creates an igrah object with IAS and metainformatio Second, it starts the GUI

#Create igraph object with all the information included
net=create_net2plot(gf_net,input_m,gf_prop,ann_net_b,frag_pattern="F",ff_net,ff_prop)

#Start GUI
start_GUI(net, ann_net_b, chr_len, example=T)

Single gene network-based propagation

The software performs the propagation of individual genes of interest belonging to a cell's expression profile. It then returns how much each gene of interest contributed to give information to the fragments. It then returns how much each fragment received information from the genes of interest. This function helps to understand the contribution of the individual genes in the two-step standard propagation It requries the name of the genes of interest. It requires the string pattern composing the names of the fragments. For example, F due to F1, F2, F3 and so on. It requires the distance between the genes of interest and the fragments to investigate. We recommend a distance equal to 3 or 4 based on the computational resources available, 3 requires less time, while 4 provides more accurate results. In other words, it limits the CIN to the subnetwork composed by the gene of interest and the fragments that are distant to it with at maximum 4 interactions: gene - f1 - f2 - f3 - f4. In this way, it does not consider too much far relationships.

#Single gene propagation -----
gene_in=c("G1")
frag_pattern = "F"
degree = 3
contrXgene_l=rwr_SGprop(gf_net, ff_net, gene_in, input_m, frag_pattern,
                        degree = degree, r1 = 0.1, r2 = 0.8, no_cores = n_cores)

Gene contribution

From the results of the single gene propagation, it is possible to extract the contribution of a specific gene of interest in the intergenic fragments and watch it in the GUI

#Create igraph object with all the information included
sff_prop=as.matrix(contrXgene_l$G1$contr_lxDest$ff_prop[,gene_in])
colnames(sff_prop)=gene_in

Gene contribution visualization

Instead of the overall propagation profile, it is possible to generate the igraph combining the chromatin interaction network and the propagation profile of a single gene of interest. In this way, it is possible to visualzione the profile and the contribution of the gene in the overall network.

#Create igraph object with all the information included
net=create_net2plot(gf_net,input_m,gf_prop,ann_net_b,frag_pattern="F",ff_net,ff_prop)

#Start GUI
start_GUI(net, ann_net_b, chr_len, example=T)


LucaGiudice/Esearch3D documentation built on Dec. 17, 2021, 1:12 a.m.