ddepn: ddepn

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

Main function for DDEPN modelling. Takes a data matrix containing longitudinal measurements as argument and infers a network structure underlying the data using either a genetic algorithm or MCMC sampling.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
ddepn(dat, phiorig=NULL, phi=NULL, th=0.8, inference="netga",
      outfile=NULL, multicores=FALSE, maxiterations=1000,
      p=500, q=0.3, m=0.8, P=NULL,
      usebics=TRUE, cores=1, priortype="laplaceinhib",
      lambda=NULL, B=NULL, samplelambda=NULL,
      hmmiterations=100, fanin=4,
      gam=NULL,it=NULL,K=NULL,quantL=.5,quantBIC=.5,
      debug=0, burnin=500, thin=FALSE, plotresults=TRUE,
      always_sample_sf=FALSE, scale_lik=FALSE, allow.stim.off=FALSE, 
      implementation="C")
      
resume_ddepn(ret,maxiterations=10000,outfile=NULL,th=0.8,plotresults=TRUE,
		debug=0,cores=NULL, implementation="C", thin=FALSE)

Arguments

dat

Matrix of double values. The data matrix to be used. Contains antibody measurements in the rows and experiments (T timepoints in each R replicates) in the columns. Each experiment is labeled by the respective perturbation in the column name. See section Details for an example.

phiorig

Adjacency matrix. Reference network used for comparison to the inferred net. Entries can be either 0, 1 or 2, for no edge, activation or inhibition, respectively. NULL if no reference network is given.

phi

Adjacency matrix. Seed network to start the search. Entries can be either 0, 1 or 2, for no edge, activation or inhibition, respectively. NULL if no start network should be given, but initialised automatically.

th

Threshold for inclusion of an edge in the final network (for netga). If an edge occurs more than th*p times in all individuals, it is included in the resulting network.

inference

String. Giving the type of network search.
netga Uses a genetic algorithm for network inference. mcmc MCMC sampling for network inference.

outfile

String. Output path for plotting. NULL if plotting should be done to the display.

multicores

Boolean. TRUE for using multiple cores and parallelise the network reconstruction. In case of netga the HMMs for each individual in the population are distributed on multiple cores. In case of mcmc, several independent MCMC runs are started, each on a separate core. FALSE for standard calculation on only one core (needs R-package multicore).

maxiterations

Integer, Maximum number of generations in netga or maximum number of iterations in mcmc_ddepn.

p

Integer, number of individuals in the population in netga.

q

Double \in [0;1], selection (1-q) and crossover (q) rate in netga.

m

Double \in [0;1], mutation rate in netga.

P

List containing an initial population of networks for netga. Set to NULL if start population should be generated automatically.

usebics

Use BIC statistic for model selection (only for netga).

cores

Number of cores to use in case of multicores=TRUE. For netga, the parallel calculations of the HMMs are distributed on cores cores, for mcmc cores independent MCMC runs are started. In resume_ddepn, cores is used for resuming a netga run, while for resuming an mcmc run, the argument is omitted and derived from the mcmc return object.

hmmiterations

Integer. Maximum number of iterations in the HMM search.

lambda

NULL, Numeric or NA. The Prior influence hyperparameter for the laplace prior. If numeric, used as fixed prior strength or starting value for prior strength sampling (when samplelambda is numeric, too). If NA, lambda is integrated out in the calculation of the prior. If NULL, no laplace prior is used.

B

The Prior information matrix. See prior for details.

fanin

Integer: maximal indegree for each node.

gam

Prior influence strength for scalefree prior. Also used as exponent in laplaceinhib prior: see prior for details.

it

Number of iterations to generate the background distribution for scalefree prior.

K

Proportionality factor for scalefree prior.

quantL

Quantile of Population Likelihood/Posterior, used as selection threshold in netga. Note that the Likelihood or Posterior have to be maximised, so all networks with a likelihood/posterior greater than this threshold are selected.

quantBIC

Quantile of Population BIC, used as selection threshold in netga. Note that the BIC is minimised, so all networks with BIC less than the threshold are selected.

samplelambda

Numeric or NULL. If NULL, the Laplace hyperparameter lambda is kept fix during the MCMC inference. If numeric, lambda is sampled uniformly around the initial value of lambda, with an interval size defined by samplelambda.

debug

Numeric. If 0, a status bar indicates the progress of the algorithm. If 1 or 2, extra information is printed to the console (for debug=2 more information than for debug=1).

burnin

Integer. Specifies the number of iterations used as burnin phase for mcmc_ddepn.

priortype

Character. One of none, uniform, laplaceinhib, laplace or scalefree for use of the respective prior type. Ignored if usebics=TRUE for netga. For netga, usebics=FALSE, priortype="none" means optimising the likelihood directly. This is equivalent to setting usebics=FALSE, priortype="uniform". For mcmc_ddepn, priortype="none" is not allowed. Use priortype="uniform" instead. laplaceinhib uses prior information for edges with two types (activation/inhibition), laplace ignores the edge type. Useful if only knowledge about the presence of an edge is available, but not about its type. scalefree assumes scale-free network architectures.

thin

Boolean. If TRUE, makes sure that the MCMC return objects are shortened to at most 10000 iterations. Defaults to FALSE.

plotresults

Boolean. If TRUE, the resulting network(s) and in case of MCMC sampling, the score traces are plotted.

always_sample_sf

Boolean. Update scaling factor in inhibMCMC sampling through the whole sampling if TRUE. Keep scaling factor fixed after burn-in if FALSE.

scale_lik

Boolean. Perform scaling of the likelihood according to how many data points were used to calculate the overall likelihood.

allow.stim.off

Boolean. If TRUE, the stimulus can become passive at some time. This will generate additional reachable system states, in particular all states from the normal state matrix, generated by the propagation, but with the stimulus node set to 0.

ret

List. The output generated during an netga or mcmc_ddepn run. Used in function resume_ddepn to resume the inference.

implementation

String. One of "C","R","R_globalest","C_globalest". Different implementations of the HMM in perform.hmmsearch. If "R", the original pure R-implementation is used, if "C", a ported C-implementation is used. If "R_globalest", an experimental version of the parameter estimation is used in the HMM, "C_globalest" is the C-port of this version. See details for a description.

Details

dat

Data matrix. Rows correspond to measured proteins/genes etc. Columns contain all experiments, i.e. separate perturbations. Each experiment i consists of T_i time points and each time point is assumed to be measured in R_i replicates. The time is indicated as a numeric value, separated by an underscore in the column name. Example:

EGF_1 EGF_1 EGF_2 EGF_2 EGF&X_1 EGF&X_2 EGF&X_2 EGF&X_2
EGF 0 0 0 0 0 0 0 0
X 0 0 0 0 0 0 0 0
AKT 1.45 1.8 0.99 1.6 1.78 1.8 1.56 1.58
ERK 1.33 1.7 1.57 1.3 0.68 0.34 0.62 0.47
MEK 0.45 0.8 0.99 0.6 0.78 0.8 0.56 0.58

For example, EGF_1 means EGF treatment at time 1, EGF&X_2 means simultaneous treatment with EGF and X at time 2 etc. One could use function addstimuli to automatically add the additional rows for the treatments to the data matrix, if they are not present. Unequal numbers of time points and replicates are allowed for each experiment. See the vignette for more details on the format of the data matrix.

implementation

Several implementations are provided, differing in the way that the Gaussian parameters are estimated in the HMM. The "R" and "C" implementations derive separate optimal state matrices for each provided experiment. The state matrices are then concatenated to estimate the Gaussians. An alternative experimental implementation "R_globalest" is available, which derives a single state matrix for all experiments in the HMM. For separate derivation, the corresponding gaussians for each experiment can be rather different, leading to rather inhomogeneous parameter estimates with large variances. Using only one HMM for all experiments overcomes this problem, since the states are chosen with respect to all experiments. However, deriving the combined state matrix leads to higher number of possible system states to be regarded in the viterbi algorithm, and this will slow down the HMM. The default is to use "C" with a reasonable trade off of quality and speed.

Value

For netga, a list containing the following elements:

dat

Double matrix. The data matrix.

phi.activation.count

Integer. Counts how often an edge is an activation in the population.

phi.inhibition.count

Integer. Counts how often an edge is an inhibition in the population.

phi.orig

Adjacency matrix. The reference network, if it was provided.

phi

Adjacency matrix. The inferred network

weights

Matrix. Each entry is the maximum of the conf.act/conf.inh entries. I.e. this describes the support for an edge in the final network.

weights.tc

Matrix. Similar to weights, but calculated ignoring the types of the edges.

stats

Matrix. Contains result statistics for each network in the population: TP, FP, TN, FN, Sensitivity(SN), Specificity(SP), precision, F1. Only present if a reference network phi.orig was provided in the function call to ddepn.

conf.act

Matrix. Calculated as phi.activation.count/p

conf.inh

Matrix. Calculated as phi.inhibition.count/p

stimuli

List. The list of the input stimuli in format list(c(Stim1=1),c(Stim1=1,Stim2=2)). The first element in this example list is a single stimulus, the second a combinatorial stimulus of Stim1 and Stim2. The numbers are the indices identifying the nodes, i.e. the index in rownames(dat). This is generated automatically from the formatted data matrix (see section details).

P

List. The population of networks that was inferred, i.e. the return list of netga.

scorestats

Matrix. Contains traces of the scores during the genetic algorithm. See netga.



For mcmc, a list containing two elements:

samplings

List. Contains all sampling runs. Each sampling run itself is a list as obtained via mcmc_ddepn.

ltraces

Matrix. Contains the posterior traces, each trace stored in one column of the matrix.

Note

TODO

Author(s)

Christian Bender

References

DDEPN
Bender et. al. 2010: Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data; Bioinformatics, Vol. 26(18), pp. i596-i602

Laplace prior
Bender, C. 2011: Systematic analysis of time resolved high-throughput data using stochastic network inference methods; PhD Thesis, University of Heidelberg, Combined Faculties for the Natural Sciences and for Mathematics, 2011

Froehlich et. al. 2007, Large scale statistical inference of signaling pathways from RNAi and microarray data; BMC Bioinformatics, Vol. 8(11), pp. 386ff

Scale free prior
Kamimura and Shimodaira, A Scale-free Prior over Graph Structures for Bayesian Inference of Gene Networks

See Also

TODO

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
## Not run: 
## load package
library(ddepn)

## sample a network
n <- 6
signet <- signalnetwork(n=n, nstim=2, cstim=0, prop.inh=0.2)
phit <- signet$phi
stimuli <- signet$stimuli

## sample data
dataset <- makedata(phit, stimuli, mu.bg=1200, sd.bg=400,
                    mu.signal.a=2000, sd.signal.a=1000)

## use original network as prior matrix
## reset all entries for inhibiting edges 
## to -1
B <- phit
B[B==2] <- -1

## Genetic algorithm, no prior
ret1 <- ddepn(dataset$datx, phiorig=phit, inference="netga",
              maxiterations=30, p=15, q=0.3, m=0.8,
              usebics=TRUE)	
x11()
plotdetailed(ret1$phi,stimuli=ret1$stimuli)
              
## mcmc, laplaceinhib prior
ret2 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc",
              maxiterations=300, burnin=100,
              usebics=FALSE, lambda=0.01, B=B, gam=1, 
              priortype="laplaceinhib") 
      
x11()
plotdetailed(ret2$samplings[[1]]$phi,stimuli=ret2$samplings[[1]]$stimuli)

## use mcmc with multiple cores, i.e. perform two independent runs
## requires package multicore and, of course multiple cores in the hardware
## use the original net as prior
 if(require(parallel)) {
 	ret3 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc",
                multicores=TRUE, cores=2,
                maxiterations=300, burnin=100,
                usebics=FALSE, lambda=0.01, B=B, gam=1, 
                priortype="laplaceinhib")
 }

## resuming the inference from an inhibMCMC run and add another 100 iterations
ret4 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc", 
			maxiterations=100, burnin=30, lambda=0.01, B=B, 
			priortype="laplaceinhib", usebics=FALSE)
ret4 <- resume_ddepn(ret4,maxiterations=100)

## resuming the inference from an netga run and add another 30 iterations
ret5 <- ddepn(dataset$datx,phiorig=phit, inference="netga", 
			maxiterations=20, p=10, q=0.3, m=0.8, lambda=0.01, B=B, 
			priortype="laplaceinhib", usebics=FALSE)
ret5 <- resume_ddepn(ret5,maxiterations=30)
 

## End(Not run)

ddepn documentation built on May 2, 2019, 4:42 p.m.