Implementation of different classes of link predictors and methods to assess and visualise their performance.
Complex systems can be represented by graphs $G(V, E)$, where $E$ is the set of interactions (edges) between a set $V$ of system components (nodes). Link predictors assign likelihood scores of interaction to all node pairs that are disconnected in the observable network topology. These scores can lead to the prediction of future friendships in a social network or future direct flights between cities in an airport transportation network.
To assess the performance of link predictors, one normally removes an increasing number of edges $E^R \subset E$ from the network of interest, applies a link prediction method to the pruned network and evaluates its performance with one of the following metrics:
recall_at_k
: Reports of the fraction of candidate links with the $|E^R|$
highest likelihood scores that are in $E^R$ (Recall at $k$, where $k = |E^R|$).aupr
: The area under the Precision-Recall curve.auroc
: The area under the Receiving Operating Characteristic curve.avg_prec
: The average Precision at the point where Recall reaches its
maximum value of 1.LinkPrediction
implements the following classes of link prediction techniques:
lp_cn
, lp_pa
, lp_jc
, lp_dice
, lp_aa
,
lp_ra
and lp_l3
).lp_car
, lp_cpa
,
lp_caa
and lp_cra
).lp_isomap
, lp_leig
, and
lp_mce
).lp_hrg
).lp_spm
).In addition, LinkPrediction
includes function lp_matrix
and get_non_edges
,
which are key to extend the package with other link prediction approaches (more
on this below).
Finally, the performance of these link predictors on a network of interest can
be assessed and visualised with functions prune_recover
(which computes all
the above-mentioned evaluation metrics) and plot_lp_performance
.
For more information on the link prediction problem, see:
devtools
package from CRAN if you haven't done so:install.packages("devtools")
devtools
package:library("devtools")
LinkPrediction
using the install_github
function:install_github("galanisl/LinkPrediction")
library(dplyr)
To start using LinkPrediction
, load the package:
library("LinkPrediction")
LinkPrediction
includes two complex network datasets (type ?karate_club
and
?jazz_collab
in R for more information). Let's use the Jazz Collaboration
network to illustrate the use of the package.
First, let's apply the Common Neighbours link predictor to the network:
(cn <- lp_cn(jazz_collab))
Note how the lp_* functions return a tibble
of candidate links with their
likelihoods of interaction. Let's now compute the structural consistency
$\sigma_c \in [0, 1]$ of this network. The higher it is, the higher its link
predictability:
sigma_c <- structural_consistency(jazz_collab) mean(sigma_c) sd(sigma_c)
Since the $\sigma_c$ is high, link predictors are likely to give us good candidates of interaction.
Let's now implement a link predictor of our own with the help of function
get_non_edges
. Our method will return a tibble with a random ordering of
the disconnected node pairs in the network:
lp_rnd <- function(g){ non_edges <- get_non_edges(g) prediction <- tibble(nodeA = non_edges[, 1], nodeB = non_edges[, 2], scr = sample(nrow(non_edges))) %>% arrange(desc(scr)) return(prediction) }
Finally, let's compare our method with other link predictors and visualise the results:
assessment <- prune_recover(jazz_collab, "lp_rnd", "lp_cn", "lp_car", "lp_spm") plot_lp_performance(assessment, err = "sd")
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.