diffuse: Diffuse scores on a network

Description Usage Arguments Details Value References Examples

View source: R/diffuse.R

Description

Function diffuse takes a network in igraph format (or a graph kernel matrix stemming from a graph) and an initial state to score all the nodes in the network. The seven diffusion scores hereby provided differ on (a) how they distinguish positives, negatives and unlabelled examples, and (b) their statistical normalisation. The argument method offers the following options:

Methods without statistical normalisation:

Methods with statistical normalisation: the raw diffusion score of every node i is computed and compared to its own diffusion scores stemming from a permuted input.

If the input labels are not quantitative, i.e. positive(1), negative(0) and possibly unlabelled, all the scores (raw, gm, ml, z, mc, ber_s, ber_p) can be used. Quantitative inputs are naturally defined on raw, z, mc, ber_s and ber_p by extending the definitions above, and are readily available in diffuStats. Further details on the scores can be found in the main vignette.

Function diffuse_grid computes diffusion scores on a grid of parameters. It is a convenient wrapper on diffuse that takes a network in igraph format or a kernel, initial scores to compute the diffusion scores for all the nodes in the network and a grid of parameters to explore. The diffusion scores are computed for every combination of parameters provided and returned in a long-format data frame.

Usage

1
2
3
diffuse(graph, scores, method, ...)

diffuse_grid(scores, grid_param, ...)

Arguments

graph

igraph object for the diffusion. Alternatively, a kernel matrix can be provided through the argument K insted of the igraph object.

scores

scores to be smoothed; either a named numeric vector, a column-wise matrix whose rownames are nodes and colnames are different scores, or a named list of such matrices.

method

character, one of raw, gm, ml, z, mc, ber_s, ber_p. For batch analysis of several methods, see ?diffuse_grid.

...

additional arguments for the diffusion method. mc and ber_p accept n.perm (number of permutations), seed (for reproducibility, defaults to 1) and sample.prob, a list of named vectors -one per background- with sampling probabilities for the null model, uniform by default. More details available in ?diffuse_mc. On the other hand, ber_s accepts eps, a parameter controlling the importance of the relative change.

grid_param

data frame containing parameter combinations to explore. The column names should be the names of the parameters. Parameters that have a fixed value can be specified in the grid or through the additional arguments (...)

Details

Input scores can be specified in three formats. A single set of scores to smooth can be represented as (1) a named numeric vector, whereas if several of these vectors that share the node names need to be smoothed, they can be provided as (2) a column-wise matrix. However, if the unlabelled entities are not the same from one case to another, (3) a named list of such score matrices can be passed to this function. The input format will be kept in the output.

The implementation of mc and ber_p is optimized for sparse inputs. Dense inputs might take a longer time to compute. Another relevant note: z can give NaN for a particular node when the observed nodes are disconnected from the node being scored. This is because these nodes are neither annotated with experimental not network (topology) data.

Value

diffuse returns the diffusion scores, with the same format as scores

diffuse_grid returns a data frame containing the diffusion scores for the specified combinations of parameters

References

Scores "raw": Vandin, F., Upfal, E., & Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), 507-522.

Scores "ml": Zoidi, O., Fotiadou, E., Nikolaidis, N., & Pitas, I. (2015). Graph-based label propagation in digital media: A review. ACM Computing Surveys (CSUR), 47(3), 48.

Scores "gm": Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., & Morris, Q. (2008). GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome biology, 9(1), S4.

Scores "mc", "ber_s", "ber_p": Bersanelli, M., Mosca, E., Remondini, D., Castellani, G., & Milanesi, L. (2016). Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules. Scientific reports, 6.

Scores "z": Harchaoui, Z., Bach, F., Cappe, O., & Moulines, E. (2013). Kernel-based methods for hypothesis testing: A unified view. IEEE Signal Processing Magazine, 30(4), 87-97.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
##############################

library(igraph)
library(ggplot2)
data(graph_toy)
input_vec <- graph_toy$input_vec
n <- vcount(graph_toy)

##############################

# Examples for 'diffuse':

# Using a binary vector as input
diff_scores <- diffuse(
    graph = graph_toy,
    scores = input_vec,
    method = "raw")

# Using a matrix as input
diff_scores <- diffuse(
    graph = graph_toy,
    scores = graph_toy$input_mat,
    method = "raw")

# Using a list of matrices as input
diff_scores <- diffuse(
    graph = graph_toy,
    scores = list(myScores1 = graph_toy$input_mat,
        myScores2 = head(graph_toy$input_mat, n/2)),
    method = "raw")

##############################

# Examples for 'diffuse_grid':

# Using a single vector of scores and comparing the methods 
# "raw", "ml", and "z"
df_diff <- diffuse_grid(
    graph = graph_toy,
    scores = graph_toy$input_vec,
    grid_param = expand.grid(method = c("raw", "ml", "z")))
head(df_diff)

# Same settings, but comparing several choices of the 
# parameter epsilon ("eps") in the scores "ber_s"
df_diff <- diffuse_grid(
    graph = graph_toy,
    scores = graph_toy$input_vec,
    grid_param = expand.grid(method = "ber_s", eps = 1:5/5))
ggplot(df_diff, aes(x = factor(eps), fill = eps, y = node_score)) + 
    geom_boxplot()

# Using a matrix with four set of scores
# called Single, Row, Small_sample, Large_sample
# See the 'quickstart' vignette for more details on these toy scores
# We compute scores for methods "ber_p" and "mc" and 
# permute both 1e3 and 1e4 times in each run
df_diff <- diffuse_grid(
    graph = graph_toy,
    scores = graph_toy$input_mat,
    grid_param = expand.grid(
        method = c("mc", "ber_p"), 
        n.perm = c(1e3, 1e4)))
dim(df_diff)
head(df_diff)

##############################

# Differences when using (1) a quantitative input and
# (2) different backgrounds. 

# In this example, the 
# small background contains binary scores and continuous scores for 
# half of the nodes in the 'graph_toy' example graph. 

# (1) Continuous scores have been generated by 
# changing the positive labels to a random, positive numeric value. 
# The user can see the impact of this in the scores 'raw', 'ber_s', 
# 'ber_p', 'mc' and 'z'

# (2) The larger background is just the small background 
# completed with zeroes, both for binary and continuous scores. 
# This illustrates how 'raw' and 'ber_s' treat unlabelled 
# and negative labels equally, whereas 'ml', 'gm', 'ber_p', 
# 'mc' and 'z' do not. 

# Examples:

# The input:
lapply(graph_toy$input_list, head)

# 'raw' scores treat equally unlabelled and negative nodes, 
# and can account for continuous inputs
diff_raw <- diffuse(
    graph = graph_toy,
    scores = graph_toy$input_list,
    method = "raw")
lapply(diff_raw, head)

# 'z' scores distinguish unlabelled and negatives and accepts 
# continuous inputs
diff_z <- diffuse(
    graph = graph_toy,
    scores = graph_toy$input_list,
    method = "z")
lapply(diff_z, head)

# 'ml' and 'gm' are the same score if there are no unobserved nodes
diff_compare <- diffuse_grid(
    graph = graph_toy, 
    scores = input_vec, 
    grid_param = expand.grid(method = c("raw", "ml", "gm"))
)
df_compare <- reshape2::acast(
    diff_compare, 
    node_id~method, 
    value.var = "node_score")
head(df_compare)

# 'ml' and 'gm' are different in presence of unobserved nodes
diff_compare <- diffuse_grid(
    graph = graph_toy, 
    scores = head(input_vec, n/2), 
    grid_param = expand.grid(method = c("raw", "ml", "gm"))
)
df_compare <- reshape2::acast(
    diff_compare, 
    node_id~method, 
    value.var = "node_score")
head(df_compare)

b2slab/diffuStats documentation built on Oct. 2, 2018, 12:58 p.m.