knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Esearch3D with promoter-genes connections

Welcome to the third vignette of our software for the prediction of the activity of integenic fragments. We suggest to start from the first vignette because introduces the software. While, this vignette proposes a real application of Esearch3D and requires at least 32GB of RAM for being finished properly. It goes through all the main operations of the software with real data. Precisely, this vignette replicates the results of our software proposed in the publication where genes are connected to the chromatin fragemnt where their promoter map to. This creates the CIN network called: WG-based mESC DNaseI-capture CIN. The only part that this vignette does not present is the classification of enhancers with machine learning. The latter topic is kept as last in order that only advance users can access to it. We hope that providing this vignette can help to understand the data, the operations and the results of our software.

Enviroment set up

Let us clean the R enviroment, set the working directory, load the software package, set the random seed generator for reproducing always the same results and define the number of cores available in the computer to run the operations. Be careful: set up the number of cores based on your resources, if you are not secure how to, then just set equal to 2

#Clean workspace and memory ----
rm(list=ls())
gc()

#Set working directory ----
gps0=getwd()
gps0=paste(gps0,"/%s",sep="")
rootDir=gps0
setwd(gsub("%s","",rootDir))

#Load libraries ----
suppressWarnings(suppressMessages(
  library("Esearch3D", quietly = T)
  )
)

#Set variables ----
#Set seed to get always the same results out of this vignette
set.seed(8)
#Set number of cores to parallelize the tasks
n_cores=5

Data set up

For this vignette, We created a DNaseI-capture HiC derived CIN whereby captured regions harbour DNaseI sensible regions in mouse embryonic stem cells (mESC), enriching for interactions of chromatin accessible regions. A chromatin fragment representing a genomic locus is represented as a node; a fragment-fragment interaction as an edge. We then integrated genes as nodes within the CIN. In this case, they are connected by an edge to the node that their promoter map to; we obtained the WG-based mESC DNaseI-capture CIN.

#Load and set up the example data ----
data("wg_data_l")
#gene - fragment interaction network generated from DNase_Prop1_mESC_TSS interactions data
gf_net=wg_data_l$gf_net
#gene-fragment-fragment interaction network generated from mESC_DNase_Net interactions data
ff_net=wg_data_l$ff_net
#sample profile with starting values for genes and fragments generated from mESC_bin_matrix_Prop1
input_m=wg_data_l$input_m
#length of chromosomes
chr_len=wg_data_l$chr_len
#gene annotation
ann_net_b=wg_data_l$ann_net_b
#genes of interests
gene_in=wg_data_l$gene_in

Multi-gene two-step propagation

In the first propagation, the expression of the genes is propagated from their corresponding nodes into only the genic fragments. Be carefull: r is the isolation parameter, use low value for first step, use high value for second step In the second propagation, the gene expression is then propagated to the rest of the CIN. The genic and intergenic fragments receive an imputed activity score (IAS) reflecting the likelihood of enhancer activity.

#Two step propagation -----
#Propagated for the network gene-fragment
gf_prop=rwr_OVprop(g=gf_net,input_m = input_m, no_cores=n_cores, r=0.1)
#Propagated for the network fragment-fragment
ff_prop=rwr_OVprop(g=ff_net,input_m = gf_prop, no_cores=n_cores, r=0.8)

Single gene network-based propagation

The software performs the propagation of individual genes of interest belonging to a cell's expression profile. It then returns how much each gene of interest contributed to give information to the fragments. It then returns how much each fragment received information from the genes of interest. This function helps to understand the contribution of the individual genes in the two-step standard propagation It requries the name of the genes of interest. It requires the string pattern composing the names of the fragments. For example, F due to F1, F2, F3 and so on. It requires the distance between the genes of interest and the fragments to investigate.

#Single gene propagation -----
output_path="sgPropagation_results.rda"
contrXgene_l=rwr_SGprop(gf_net, ff_net, gene_in = gene_in$V2[3:4], frag_pattern="frag", out_rda=output_path, degree = 4, r1 = 0.1, r2 = 0.8, no_cores = n_cores)


LucaGiudice/Esearch3D documentation built on Dec. 17, 2021, 1:12 a.m.