Measure the strength of association using compactness scores

Share:

Description

The compactness score of set of hits on a network is the mean distance between each pair of hits. By comparing the observed compactness score to the scores of permuted hit sets, it is possible to compute the significance of the strength of association between the phenotype and the network. This method is not as effective as the Knet function and is included only for comparison.

Usage

1
2
3
Compactness(g, nperm=100, dist.method=c("shortest.paths", "diffusion", "mfpt"), 
vertex.attr="pheno", edge.attr="distance", correct.factor=1, D=NULL, 
verbose=T)

Arguments

g

igraph object, the network to work on.

nperm

Integer value, the number of permutations to be completed.

dist.method

String, the method used to compute the distance between each pair of hits on the network.

vertex.attr

Character vector, the name of the vertex attributes under which the hits to be tested are stored. The vector can contain one or more vertex attributes.

edge.attr

String, the name of the edge attribute to be used as distances along the edges. If an edge attribute with this name is not found, then each edge is assumed to have a distance of 1. Smaller edge distances denote stronger interactions between vertex pairs.

correct.factor

Numeric value. If the network contains unconnected vertices, then the distance between these vertices is set as the maximum distance between the connected vertices multiplied by correct.factor.

D

Symmetrical numerical matrix. A precomputed distance matrix for g output by the DistGraph function. If NULL, then D is computed by the Compactness function.

verbose

Logical, if TRUE messages about the progress of the function are displayed.

Details

The compactness score is used by the PathExpand tool by Glaab et al. (2010). It is a measure of the mean distance between a set of genes in a network. By comparing the compactness score of an observed set of hits to sets of permuted hits, it is possible to produce a p-value describing the strength of association between the gene set and the network. This is not some done within the original paper by Glaab et al. (2010). The function is much like the Knet function, albeit not as effective.

The compactness score C is defined as the mean shortest path distance between pairs of vertices in a set P on network g.

C(P) = \frac{2 ∑_{i,j \in P; i < j} d^g(i,j)}{|P| * (|P| - 1)}

The compactness score is only included within the SANTA package to allow for comparisons to be made. Unlike the Knet function, it cannot be applied to continuous distributions of vertex weights. It can also result in biases if there is large variability in density across the network.

The weight of a vertex should be 1 if it is a hit, 0 if it is not a hit or NA if the information is missing. Vertices with missing weights are still included within the network but are excluded from the permuted sets.

Value

If one vertex attribute is input, Compactness is run on the single set of vertex weights and a list containing the statistics below is returned. If more than one vertex attribute is input, then Compactness is run on each set of vertex weights and a list containing an element for each vertex attribute is returned. Each element contains a sub-list containing the statistics below for the relevant vertex attribute.

score.obs

Observed compactness score

score.perm

Permuted compactness scores. NA if no permutations are completed.

pval

p-value, computed from a z-score derived from the observed and permuted compactness scores. NA if no permutations are completed.

Author(s)

Alex J. Cornish a.cornish12@imperial.ac.uk

References

Cornish, A.J. and Markowetz, F. (2014) SANTA: Quantifying the Functional Content of Molecular Networks.. PLOS Computational Biology. 10:9, e1003808.

Glaab, E., Baudot A., Krasnogor N. and Valencia A. (2010). Extending pathways and processes using molecular interaction networks to analyse cancer genome data. BMC Bioinformatics. 11(1): 597:607.