Description Usage Arguments Details Value References Examples
Function diffuse
takes a network in
igraph format (or a graph kernel matrix
stemming from a graph) and an initial state
to score all the nodes in the network.
The seven diffusion scores hereby provided differ on
(a) how they distinguish positives, negatives and unlabelled examples,
and (b) their statistical normalisation.
The argument method
offers the following options:
Methods without statistical normalisation:
raw
: positive nodes introduce unitary flow
(y_raw[i] = 1
) to the
network, whereas neither negative nor unlabelled nodes
introduce anything (y_raw[j] = 0
)
[Vandin, 2011].
They are computed as:
f_{raw} = K y_{raw}
where K
is a graph kernel, see ?kernels
.
These scores treat negative and unlabelled nodes
equivalently.
ml
: same as raw
, but negative nodes introduce a
negative unit of flow [Zoidi, 2015] and are therefore not equivalent
to unlabelled nodes.
gm
: same as ml
, but the unlabelled nodes are assigned
a (generally non-null) bias term based on the total number of positives,
negatives and unlabelled nodes [Mostafavi, 2008].
ber_s
: this is a quantification of the relative
change in the node score before and after the network smoothing.
The score for a particular node i
can be written as
f_{ber_s}[i] = f_{raw}[i]/(y_{raw}[i] + eps)
where eps
is a parameter controlling the importance of the relative
change.
Methods with statistical normalisation: the raw
diffusion score
of every node i
is computed and compared to
its own diffusion scores stemming from a permuted input.
mc
: the score of node i
is based in
its empirical p-value,
computed by permuting the input n.perm
times:
p[i] = (r[i] + 1)(n.perm + 1)
p[i]
is roughly the proportion of input permutations
that led to a diffusion score as high or higher than the
original diffusion score
(a total of r[i]
for node i
, in absolute terms).
This assesses how likely a high diffusion score is to arise
from chance, in absence of signal.
To be consistent with the direction, mc
is defined as:
f_{mc}[i] = 1 - p[i]
ber_p
: as used in [Bersanelli, 2016], this
score combines raw
and mc
, in order to take into
account both the magnitude of the raw
scores and the
effect of the network topology:
f_{ber_p}[i] = -log10(p[i]) f_{raw}[i]
z
: this is a parametric alternative to mc
.
The raw
score of node i
is subtracted its mean
value and divided by its standard deviation.
The statistical moments have a closed analytical form,
see the main vignette, and are inspired in [Harchaoui, 2013].
Unlike mc
and ber_p
, the z
scores do not
require actual permutations, giving them an advantage in terms of speed.
If the input labels are not quantitative, i.e. positive(1), negative(0)
and possibly unlabelled, all the scores (raw
, gm
,
ml
, z
, mc
, ber_s
, ber_p
) can be used.
Quantitative inputs are naturally defined on
raw
, z
, mc
, ber_s
and ber_p
by extending the definitions above, and are readily available
in diffuStats
.
Further details on the scores can be found in the main vignette.
1 2 3 | diffuse(graph, scores, method, ...)
diffuse_grid(scores, grid_param, ...)
|
graph |
igraph object for the diffusion.
Alternatively, a kernel matrix can be provided through the
argument |
scores |
scores to be smoothed; either a named numeric vector, a column-wise matrix whose rownames are nodes and colnames are different scores, or a named list of such matrices. |
method |
character, one of |
... |
additional arguments for the diffusion method.
|
grid_param |
data frame containing parameter combinations to explore.
The column names should be the names of the parameters.
Parameters that have a fixed value can be specified in the grid
or through the additional arguments ( |
Input scores can be specified in three formats. A single set of scores to smooth can be represented as (1) a named numeric vector, whereas if several of these vectors that share the node names need to be smoothed, they can be provided as (2) a column-wise matrix. However, if the unlabelled entities are not the same from one case to another, (3) a named list of such score matrices can be passed to this function. The input format will be kept in the output.
The implementation of mc
and ber_p
is optimized
for sparse inputs.
Dense inputs might take a longer time to compute.
Another relevant note: z
can give NaN for a particular node
when the observed nodes are disconnected from the node being scored.
This is because these nodes are neither annotated with experimental
not network (topology) data.
diffuse
returns the diffusion scores,
with the same format as scores
diffuse_grid
returns a data frame
containing the diffusion scores for the specified
combinations of parameters
Scores "raw": Vandin, F., Upfal, E., & Raphael, B. J. (2011). Algorithms for detecting significantly mutated pathways in cancer. Journal of Computational Biology, 18(3), 507-522.
Scores "ml": Zoidi, O., Fotiadou, E., Nikolaidis, N., & Pitas, I. (2015). Graph-based label propagation in digital media: A review. ACM Computing Surveys (CSUR), 47(3), 48.
Scores "gm": Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., & Morris, Q. (2008). GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome biology, 9(1), S4.
Scores "mc", "ber_s", "ber_p": Bersanelli, M., Mosca, E., Remondini, D., Castellani, G., & Milanesi, L. (2016). Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules. Scientific reports, 6.
Scores "z": Harchaoui, Z., Bach, F., Cappe, O., & Moulines, E. (2013). Kernel-based methods for hypothesis testing: A unified view. IEEE Signal Processing Magazine, 30(4), 87-97.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ##############################
library(igraph)
library(ggplot2)
data(graph_toy)
input_vec <- graph_toy$input_vec
n <- vcount(graph_toy)
##############################
# Examples for 'diffuse':
# Using a binary vector as input
diff_scores <- diffuse(
graph = graph_toy,
scores = input_vec,
method = "raw")
# Using a matrix as input
diff_scores <- diffuse(
graph = graph_toy,
scores = graph_toy$input_mat,
method = "raw")
# Using a list of matrices as input
diff_scores <- diffuse(
graph = graph_toy,
scores = list(myScores1 = graph_toy$input_mat,
myScores2 = head(graph_toy$input_mat, n/2)),
method = "raw")
##############################
# Examples for 'diffuse_grid':
# Using a single vector of scores and comparing the methods
# "raw", "ml", and "z"
df_diff <- diffuse_grid(
graph = graph_toy,
scores = graph_toy$input_vec,
grid_param = expand.grid(method = c("raw", "ml", "z")))
head(df_diff)
# Same settings, but comparing several choices of the
# parameter epsilon ("eps") in the scores "ber_s"
df_diff <- diffuse_grid(
graph = graph_toy,
scores = graph_toy$input_vec,
grid_param = expand.grid(method = "ber_s", eps = 1:5/5))
ggplot(df_diff, aes(x = factor(eps), fill = eps, y = node_score)) +
geom_boxplot()
# Using a matrix with four set of scores
# called Single, Row, Small_sample, Large_sample
# See the 'quickstart' vignette for more details on these toy scores
# We compute scores for methods "ber_p" and "mc" and
# permute both 1e3 and 1e4 times in each run
df_diff <- diffuse_grid(
graph = graph_toy,
scores = graph_toy$input_mat,
grid_param = expand.grid(
method = c("mc", "ber_p"),
n.perm = c(1e3, 1e4)))
dim(df_diff)
head(df_diff)
##############################
# Differences when using (1) a quantitative input and
# (2) different backgrounds.
# In this example, the
# small background contains binary scores and continuous scores for
# half of the nodes in the 'graph_toy' example graph.
# (1) Continuous scores have been generated by
# changing the positive labels to a random, positive numeric value.
# The user can see the impact of this in the scores 'raw', 'ber_s',
# 'ber_p', 'mc' and 'z'
# (2) The larger background is just the small background
# completed with zeroes, both for binary and continuous scores.
# This illustrates how 'raw' and 'ber_s' treat unlabelled
# and negative labels equally, whereas 'ml', 'gm', 'ber_p',
# 'mc' and 'z' do not.
# Examples:
# The input:
lapply(graph_toy$input_list, head)
# 'raw' scores treat equally unlabelled and negative nodes,
# and can account for continuous inputs
diff_raw <- diffuse(
graph = graph_toy,
scores = graph_toy$input_list,
method = "raw")
lapply(diff_raw, head)
# 'z' scores distinguish unlabelled and negatives and accepts
# continuous inputs
diff_z <- diffuse(
graph = graph_toy,
scores = graph_toy$input_list,
method = "z")
lapply(diff_z, head)
# 'ml' and 'gm' are the same score if there are no unobserved nodes
diff_compare <- diffuse_grid(
graph = graph_toy,
scores = input_vec,
grid_param = expand.grid(method = c("raw", "ml", "gm"))
)
df_compare <- reshape2::acast(
diff_compare,
node_id~method,
value.var = "node_score")
head(df_compare)
# 'ml' and 'gm' are different in presence of unobserved nodes
diff_compare <- diffuse_grid(
graph = graph_toy,
scores = head(input_vec, n/2),
grid_param = expand.grid(method = c("raw", "ml", "gm"))
)
df_compare <- reshape2::acast(
diff_compare,
node_id~method,
value.var = "node_score")
head(df_compare)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.