ddepn: ddepn
In ddepn: Dynamic Deterministic Effects Propagation Networks: Infer signalling networks for timecourse RPPA data.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Main function for DDEPN modelling. Takes a data matrix containing longitudinal measurements as argument and infers a network structure underlying the data using either a genetic algorithm or MCMC sampling.

ddepn(dat, phiorig=NULL, phi=NULL, th=0.8, inference="netga",
      outfile=NULL, multicores=FALSE, maxiterations=1000,
      p=500, q=0.3, m=0.8, P=NULL,
      usebics=TRUE, cores=1, priortype="laplaceinhib",
      lambda=NULL, B=NULL, samplelambda=NULL,
      hmmiterations=100, fanin=4,
      gam=NULL,it=NULL,K=NULL,quantL=.5,quantBIC=.5,
      debug=0, burnin=500, thin=FALSE, plotresults=TRUE,
      always_sample_sf=FALSE, scale_lik=FALSE, allow.stim.off=FALSE, 
      implementation="C")
      
resume_ddepn(ret,maxiterations=10000,outfile=NULL,th=0.8,plotresults=TRUE,
		debug=0,cores=NULL, implementation="C", thin=FALSE)

`dat`	Matrix of double values. The data matrix to be used. Contains antibody measurements in the rows and experiments (T timepoints in each R replicates) in the columns. Each experiment is labeled by the respective perturbation in the column name. See section Details for an example.
`phiorig`	Adjacency matrix. Reference network used for comparison to the inferred net. Entries can be either 0, 1 or 2, for no edge, activation or inhibition, respectively. NULL if no reference network is given.
`phi`	Adjacency matrix. Seed network to start the search. Entries can be either 0, 1 or 2, for no edge, activation or inhibition, respectively. NULL if no start network should be given, but initialised automatically.
`th`	Threshold for inclusion of an edge in the final network (for `netga`). If an edge occurs more than thp* times in all individuals, it is included in the resulting network.
`inference`	String. Giving the type of network search. `netga` Uses a genetic algorithm for network inference. `mcmc` MCMC sampling for network inference.
`outfile`	String. Output path for plotting. NULL if plotting should be done to the display.
`multicores`	Boolean. TRUE for using multiple cores and parallelise the network reconstruction. In case of `netga` the HMMs for each individual in the population are distributed on multiple cores. In case of `mcmc`, several independent MCMC runs are started, each on a separate core. FALSE for standard calculation on only one core (needs R-package multicore).
`maxiterations`	Integer, Maximum number of generations in `netga` or maximum number of iterations in `mcmc_ddepn`.
`p`	Integer, number of individuals in the population in `netga`.
`q`	Double \in [0;1], selection (1-q) and crossover (q) rate in `netga`.
`m`	Double \in [0;1], mutation rate in `netga`.
`P`	List containing an initial population of networks for `netga`. Set to NULL if start population should be generated automatically.
`usebics`	Use BIC statistic for model selection (only for `netga`).
`cores`	Number of cores to use in case of `multicores`=TRUE. For `netga`, the parallel calculations of the HMMs are distributed on `cores` cores, for `mcmc` `cores` independent MCMC runs are started. In `resume_ddepn`, cores is used for resuming a `netga` run, while for resuming an `mcmc` run, the argument is omitted and derived from the mcmc return object.
`hmmiterations`	Integer. Maximum number of iterations in the HMM search.
`lambda`	NULL, Numeric or NA. The Prior influence hyperparameter for the laplace prior. If numeric, used as fixed prior strength or starting value for prior strength sampling (when `samplelambda` is numeric, too). If NA, lambda is integrated out in the calculation of the prior. If NULL, no laplace prior is used.
`B`	The Prior information matrix. See `prior` for details.
`fanin`	Integer: maximal indegree for each node.
`gam`	Prior influence strength for scalefree prior. Also used as exponent in `laplaceinhib` prior: see `prior` for details.
`it`	Number of iterations to generate the background distribution for scalefree prior.
`K`	Proportionality factor for scalefree prior.
`quantL`	Quantile of Population Likelihood/Posterior, used as selection threshold in `netga`. Note that the Likelihood or Posterior have to be maximised, so all networks with a likelihood/posterior greater than this threshold are selected.
`quantBIC`	Quantile of Population BIC, used as selection threshold in `netga`. Note that the BIC is minimised, so all networks with BIC less than the threshold are selected.
`samplelambda`	Numeric or NULL. If NULL, the Laplace hyperparameter `lambda` is kept fix during the MCMC inference. If numeric, `lambda` is sampled uniformly around the initial value of `lambda`, with an interval size defined by `samplelambda`.
`debug`	Numeric. If 0, a status bar indicates the progress of the algorithm. If 1 or 2, extra information is printed to the console (for `debug=2` more information than for `debug=1`).
`burnin`	Integer. Specifies the number of iterations used as burnin phase for `mcmc_ddepn`.
`priortype`	Character. One of `none`, `uniform`, `laplaceinhib`, `laplace` or `scalefree` for use of the respective prior type. Ignored if `usebics=TRUE` for `netga`. For `netga`, `usebics=FALSE, priortype="none"` means optimising the likelihood directly. This is equivalent to setting `usebics=FALSE, priortype="uniform"`. For `mcmc_ddepn`, `priortype="none"` is not allowed. Use `priortype="uniform"` instead. `laplaceinhib` uses prior information for edges with two types (activation/inhibition), `laplace` ignores the edge type. Useful if only knowledge about the presence of an edge is available, but not about its type. `scalefree` assumes scale-free network architectures.
`thin`	Boolean. If TRUE, makes sure that the MCMC return objects are shortened to at most 10000 iterations. Defaults to FALSE.
`plotresults`	Boolean. If TRUE, the resulting network(s) and in case of MCMC sampling, the score traces are plotted.
`always_sample_sf`	Boolean. Update scaling factor in inhibMCMC sampling through the whole sampling if TRUE. Keep scaling factor fixed after burn-in if FALSE.
`scale_lik`	Boolean. Perform scaling of the likelihood according to how many data points were used to calculate the overall likelihood.
`allow.stim.off`	Boolean. If TRUE, the stimulus can become passive at some time. This will generate additional reachable system states, in particular all states from the normal state matrix, generated by the propagation, but with the stimulus node set to 0.
`ret`	List. The output generated during an `netga` or `mcmc_ddepn` run. Used in function `resume_ddepn` to resume the inference.
`implementation`	String. One of `"C","R","R_globalest","C_globalest"`. Different implementations of the HMM in `perform.hmmsearch`. If `"R"`, the original pure R-implementation is used, if `"C"`, a ported C-implementation is used. If `"R_globalest"`, an experimental version of the parameter estimation is used in the HMM, `"C_globalest"` is the C-port of this version. See `details` for a description.

dat

Data matrix. Rows correspond to measured proteins/genes etc. Columns contain all experiments, i.e. separate perturbations. Each experiment i consists of T_i time points and each time point is assumed to be measured in R_i replicates. The time is indicated as a numeric value, separated by an underscore in the column name. Example:

	EGF_1	EGF_1	EGF_2	EGF_2	EGF&X_1	EGF&X_2	EGF&X_2	EGF&X_2
EGF	0	0	0	0	0	0	0	0
X	0	0	0	0	0	0	0	0
AKT	1.45	1.8	0.99	1.6	1.78	1.8	1.56	1.58
ERK	1.33	1.7	1.57	1.3	0.68	0.34	0.62	0.47
MEK	0.45	0.8	0.99	0.6	0.78	0.8	0.56	0.58

For example, EGF_1 means EGF treatment at time 1, EGF&X_2 means simultaneous treatment with EGF and X at time 2 etc. One could use function addstimuli to automatically add the additional rows for the treatments to the data matrix, if they are not present. Unequal numbers of time points and replicates are allowed for each experiment. See the vignette for more details on the format of the data matrix.

implementation

Several implementations are provided, differing in the way that the Gaussian parameters are estimated in the HMM. The "R" and "C" implementations derive separate optimal state matrices for each provided experiment. The state matrices are then concatenated to estimate the Gaussians. An alternative experimental implementation "R_globalest" is available, which derives a single state matrix for all experiments in the HMM. For separate derivation, the corresponding gaussians for each experiment can be rather different, leading to rather inhomogeneous parameter estimates with large variances. Using only one HMM for all experiments overcomes this problem, since the states are chosen with respect to all experiments. However, deriving the combined state matrix leads to higher number of possible system states to be regarded in the viterbi algorithm, and this will slow down the HMM. The default is to use "C" with a reasonable trade off of quality and speed.

For netga, a list containing the following elements:

`dat`	Double matrix. The data matrix.
`phi.activation.count`	Integer. Counts how often an edge is an activation in the population.
`phi.inhibition.count`	Integer. Counts how often an edge is an inhibition in the population.
`phi.orig`	Adjacency matrix. The reference network, if it was provided.
`phi`	Adjacency matrix. The inferred network
`weights`	Matrix. Each entry is the maximum of the conf.act/conf.inh entries. I.e. this describes the support for an edge in the final network.
`weights.tc`	Matrix. Similar to weights, but calculated ignoring the types of the edges.
`stats`	Matrix. Contains result statistics for each network in the population: TP, FP, TN, FN, Sensitivity(SN), Specificity(SP), precision, F1. Only present if a reference network `phi.orig` was provided in the function call to `ddepn`.
`conf.act`	Matrix. Calculated as phi.activation.count/p
`conf.inh`	Matrix. Calculated as phi.inhibition.count/p
`stimuli`	List. The list of the input stimuli in format `list(c(Stim1=1),c(Stim1=1,Stim2=2))`. The first element in this example list is a single stimulus, the second a combinatorial stimulus of Stim1 and Stim2. The numbers are the indices identifying the nodes, i.e. the index in `rownames(dat)`. This is generated automatically from the formatted data matrix (see section details).
`P`	List. The population of networks that was inferred, i.e. the return list of `netga`.
`scorestats`	Matrix. Contains traces of the scores during the genetic algorithm. See `netga`.

For mcmc, a list containing two elements:

`samplings`	List. Contains all sampling runs. Each sampling run itself is a list as obtained via `mcmc_ddepn`.
`ltraces`	Matrix. Contains the posterior traces, each trace stored in one column of the matrix.

TODO

Christian Bender

DDEPN
Bender et. al. 2010: Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data; Bioinformatics, Vol. 26(18), pp. i596-i602

Laplace prior
Bender, C. 2011: Systematic analysis of time resolved high-throughput data using stochastic network inference methods; PhD Thesis, University of Heidelberg, Combined Faculties for the Natural Sciences and for Mathematics, 2011

Froehlich et. al. 2007, Large scale statistical inference of signaling pathways from RNAi and microarray data; BMC Bioinformatics, Vol. 8(11), pp. 386ff

Scale free prior
Kamimura and Shimodaira, A Scale-free Prior over Graph Structures for Bayesian Inference of Gene Networks

TODO

## Not run: 
## load package
library(ddepn)

## sample a network
n <- 6
signet <- signalnetwork(n=n, nstim=2, cstim=0, prop.inh=0.2)
phit <- signet$phi
stimuli <- signet$stimuli

## sample data
dataset <- makedata(phit, stimuli, mu.bg=1200, sd.bg=400,
                    mu.signal.a=2000, sd.signal.a=1000)

## use original network as prior matrix
## reset all entries for inhibiting edges 
## to -1
B <- phit
B[B==2] <- -1

## Genetic algorithm, no prior
ret1 <- ddepn(dataset$datx, phiorig=phit, inference="netga",
              maxiterations=30, p=15, q=0.3, m=0.8,
              usebics=TRUE)	
x11()
plotdetailed(ret1$phi,stimuli=ret1$stimuli)
              
## mcmc, laplaceinhib prior
ret2 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc",
              maxiterations=300, burnin=100,
              usebics=FALSE, lambda=0.01, B=B, gam=1, 
              priortype="laplaceinhib") 
      
x11()
plotdetailed(ret2$samplings[[1]]$phi,stimuli=ret2$samplings[[1]]$stimuli)

## use mcmc with multiple cores, i.e. perform two independent runs
## requires package multicore and, of course multiple cores in the hardware
## use the original net as prior
 if(require(parallel)) {
 	ret3 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc",
                multicores=TRUE, cores=2,
                maxiterations=300, burnin=100,
                usebics=FALSE, lambda=0.01, B=B, gam=1, 
                priortype="laplaceinhib")
 }

## resuming the inference from an inhibMCMC run and add another 100 iterations
ret4 <- ddepn(dataset$datx,phiorig=phit, inference="mcmc", 
			maxiterations=100, burnin=30, lambda=0.01, B=B, 
			priortype="laplaceinhib", usebics=FALSE)
ret4 <- resume_ddepn(ret4,maxiterations=100)

## resuming the inference from an netga run and add another 30 iterations
ret5 <- ddepn(dataset$datx,phiorig=phit, inference="netga", 
			maxiterations=20, p=10, q=0.3, m=0.8, lambda=0.01, B=B, 
			priortype="laplaceinhib", usebics=FALSE)
ret5 <- resume_ddepn(ret5,maxiterations=30)
 

## End(Not run)