InferNetworks-methods: Network construction using Hierarchical Dirichlet Process

Description Usage Arguments Value Author(s) References See Also Examples

Description

This methods applies a Hierarchical Dirichlet Process (HDP) algorithm on the collection of proteins networks to infer the set of chromatin loop-maintainer proteins. HDP are non-parametric Bayesian models widely used in document classification as it enables us to model datasets with a mixtures of classes. In our case, we suppose that different kinds of networks are involved in maintaining the different loops. Thus, to make an analogy with topic modeling, we each DNA-interaction maintaining protein network as a document and each edge in this network as word. Thus, the task is to say which word (edge) belongs to which topic (chromatin-maintainer family). The method implementation is based on the C++ code of Chong Wang and David Blei with adaptation to Rcpp and removal of the dependency on the Gnu Scientific Library.

Usage

1
2
## S4 method for signature 'NetworkCollection'
InferNetworks(object,thr =0.5,max_iter = 500L, max_time = 3600L, ...)

Arguments

object

a NetworkCollection object in which the list of protein interactions associated with each DNA interaction is already populated.

thr

Used to select the top protein interaction in each inferred chromatin-maintainer family. In HDP each topic (Chromatin-maintainer family) is considered as a distribution over words (edges), thus, for each topic we consider the words that capture threshold percent of the topic to be the top words. For example, in topic1, we first rank edges by partnership probability to topic1 in a decreasing order, and we take the top edges that capture 50% of the partnership.

max_iter

maximum number of iterations (befault 500).

max_time

maximum runing time (3600 sec).

...

You can pass additional paramters to control the behaviour of the HDP model. The possible paramters are eta, alpha and gamma. eta controls how edges are assigned to CMNs on the global level. smaller eta values will lead to sparce edge-to-CMN assignment, which eta >1 leads to more uniform assignments. gamma on the other hand, controls the number of CMNs, smaller gamma values will produce a small number of CMNs and gamma>1 will favor the generation of more. alpha controls the sparcity at the local PPI. smaller alpha value force edges to be conrolled by a small number of CMNs, while lagrger values leads to more uniform distribution. By default eta = 0.01, gamma =1 and alpha =1.

Value

Returns a ChromMaintainers object that contains the list of inferred networks and the probability of each edge in each network.

Author(s)

Mohamed Nadhir Djekidel (nde12@mails.tsinghua.edu.cn)

References

https://www.cs.princeton.edu/~blei/topicmodeling.html (C. Wang's hdp code)

Chong Wang, John Paisley and David M. Blei, Online variational inference for the hierarchical Dirichlet process .In AISTATS 2011

Mohamed Nadhir D, Yang C et al 3CPET: Finding Co-factor Complexes in Chia-PET experiment using a Hierarchical Dirichlet Process, ....

See Also

NetworkCollection, ChromMaintainers

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
    ## get the different datasets path
    petFile <- file.path(system.file("example",package="R3CPET"),"HepG2_interactions.txt")  
    tfbsFile <- file.path(system.file("example",package="R3CPET"),"HepG2_TF.txt.gz")  
## Not run: 
    x <- ChiapetExperimentData(pet = petFile, tfbs=  tfbsFile, IsBed = FALSE, ppiType="HPRD", filter= TRUE) 
    ## build the different indexes
    x <- createIndexes(x)
    ## build networks connecting each interacting regions
    nets<- buildNetworks(x)

    ## infer the networks
    hlda<- InferNetworks(nets)
    hlda

## End(Not run)

R3CPET documentation built on Nov. 8, 2020, 8:05 p.m.