# Main function of DART

### Description

This is the main function implementing DART. Given a data matrix and a model (pathway) signature it will construct the relevance correlation network for the genes in the signature over the data, evaluate the consistency of the correlative patterns with those predicted by the model signature, filter out the noise and finally obtain estimates of pathway activity for each individual sample. Specifically, it will call and run the following functions:

(1) `BuildRN:`

This function builds a relevance correlation network of the model pathway signature in the data set in which the pathway activity estimate is desired. We point that this step is totally unsupervised and does not use and phenotypic information of the samples.

(2) `EvalConsNet:`

This function evaluates the consistency of the inferred network with the prior information of the model pathway signature. The up/down regulatory pattern given by the model signature implies predictions about the directionality of the gene-gene correlations in the independent data set. For instance, if gene "A" is upregulated and gene "B" is downregulated, then assuming that the model signature has any relevance in the independent data set, we would expect genes "A" and "B" to be anti-correlated. Thus, a consistency score can be computed. Only if the consistency score is higher than the score expected by random chance is it recommended that the model signature be used to infer pathway activity.

(3) `PruneNet:`

This function obtains the pruned, i.e consistent, network, in which any edge represents a significant correlation in gene expression whose directionality agrees with that predicted by the prior information. This is the denoising step of the algorithm. The function returns the whole pruned network and its maximally connected component.

(4) `PredActScore:`

Given the adjacency matrix of the maximally connected consistent subnetwork and given the regulatory weights of the corresponding model pathway signature, this function estimates a pathway activation score in each sample. This function can also be used to infer pathway activity in another independent data set using the inferred subnetwork.

Before performing the pruning step, `DoDART`

will check whether
the relevance correlation network is significantly consistent with the
predictions from the model signature. Significance is assessed by
first computing a consistency score (in effect, the fraction of edges in the
relevance network which are consistent with the model prediction) and
subsequently by 1000 random permutations to obtain an empirical null
distribution for the consistency score. Model signatures whose consistency scores have
empirical P-values less than 0.001 are deemed consistent. If the
consistency score is not significant, the function will issue a warning and it is not recommended to use the signature to predict pathway activity.

### Usage

1 | ```
DoDART(data.m, sign.v, fdr)
``` |

### Arguments

`data.m` |
Data matrix (numeric). Rows label features, columns label samples. It is assumed that number of features is much larger than number of samples. Rownames of |

`sign.v` |
Model pathway signature vector (numeric). Elements represent the regulatory weights (i.e if up or downregulated). Names of |

`fdr` |
Desired false discovery rate (numeric) which determines the allowed number of false positives in the relevance network. The default value is 0.000001. Since typically model signatures may contain on the order of 100 genes, this amounts to constructing a relevance network on the order of 10000 edges (pairwise correlations). Using a Bonferroni correction, this leads to a P-value threshold of approx. 1e-6 to 1e-7. This parameter is tunable, so choosing a number of different thresholds may be advisable. |

### Value

`netcons` |
A vector summarising the properties of the correlation network and the consistency with the model pathway signature: nG is the number of genes in the signature, nE is the number of edges of the relevance network generated by function |

`adj` |
Adjacency matrix of maximally connected consistent relevance network. |

`sign` |
Model pathway signature vector of genes found in data set and in maximally connected component. |

`score` |
Predicted activation scores of the model signature in the samples of data set |

`degree` |
Degrees/connectivities of the genes in the DART network. |

`consist` |
Module consistency result, output of |

### Author(s)

Andrew E Teschendorff, Yan Jiao

### References

Jiao Y, Lawler K, Patel GS, Purushotham A, Jones AF, Grigoriadis A, Ng T, Teschendorff AE. (2011) Denoising algorithm based on relevance network topology improves molecular pathway activity inference. BMC Bioinformatics 12:403.

Teschendorff AE, Gomez S, Arenas A, El-Ashry D, Schmidt M, et al. (2010) Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules. BMC Cancer 10:604.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ```
### Example
### load in example data:
data(dataDART);
### dataDART$data: mRNA expression data of 67 ER negative breast cancer samples.
### dataDART$pheno: 51 basals and 16 HER2+ (ERBB2+).
### dataDART$phenoMAINZ: 24 basals and 8 HER2+ (ERBB2+).
### dataDART$sign: perturbation signature of ERBB2 activation.
### Using DoDART
dart.o <- DoDART(dataDART$data,dataDART$sign,fdr=0.000001);
### check that activation is higher in HER2+ compared to basals
boxplot(dart.o$score ~ dataDART$pheno);
pv <- wilcox.test(dart.o$score ~ dataDART$pheno)$p.value;
text(x=1.5,y=3.8,labels=paste("P=",pv,sep=""));
``` |