README.md

LinkPrediction

Gregorio Alanis-Lobato

Implementation of different classes of link predictors and methods to assess and visualise their performance.

Introduction

Complex systems can be represented by graphs G(V, E), where E is the set of interactions (edges) between a set V of system components (nodes). Link predictors assign likelihood scores of interaction to all node pairs that are disconnected in the observable network topology. These scores can lead to the prediction of future friendships in a social network or future direct flights between cities in an airport transportation network.

To assess the performance of link predictors, one normally removes an increasing number of edges ER ⊂ E from the network of interest, applies a link prediction method to the pruned network and evaluates its performance with one of the following metrics:

LinkPrediction implements the following classes of link prediction techniques:

In addition, LinkPrediction includes function lp_matrix and get_non_edges, which are key to extend the package with other link prediction approaches (more on this below).

Finally, the performance of these link predictors on a network of interest can be assessed and visualised with functions prune_recover (which computes all the above-mentioned evaluation metrics) and plot_lp_performance.

For more information on the link prediction problem, see:

Installation

  1. Install the devtools package from CRAN if you haven't done so:
install.packages("devtools")
  1. Load the devtools package:
library("devtools")
  1. Install LinkPrediction using the install_github function:
install_github("galanisl/LinkPrediction")

Usage

To start using LinkPrediction, load the package:

library("LinkPrediction")

LinkPrediction includes two complex network datasets (type ?karate_club and ?jazz_collab in R for more information). Let's use the Jazz Collaboration network to illustrate the use of the package.

First, let's apply the Common Neighbours link predictor to the network:

(cn <- lp_cn(jazz_collab))
## # A tibble: 16,761 x 3
##    nodeA nodeB   scr
##    <int> <int> <dbl>
##  1     7   111   41.
##  2    27    79   37.
##  3    27    66   33.
##  4     7   113   29.
##  5    66   118   29.
##  6    74    89   28.
##  7    88    92   27.
##  8    23   124   27.
##  9    53    66   26.
## 10    27    99   26.
## # ... with 16,751 more rows

Note how the lp_* functions return a tibble of candidate links with their likelihoods of interaction. Let's now compute the structural consistency σc ∈ [0, 1] of this network. The higher it is, the higher its link predictability:

sigma_c <- structural_consistency(jazz_collab)
mean(sigma_c)
## [1] 0.7053285
sd(sigma_c)
## [1] 0.01851257

Since the σc is high, link predictors are likely to give us good candidates of interaction.

Let's now implement a link predictor of our own with the help of function get_non_edges. Our method will return a tibble with a random ordering of the disconnected node pairs in the network:

lp_rnd <- function(g){
  non_edges <- get_non_edges(g)
  prediction <- tibble(nodeA = non_edges[, 1], nodeB = non_edges[, 2],
                       scr = sample(nrow(non_edges))) %>% 
    arrange(desc(scr))
  return(prediction)
}

Finally, let's compare our method with other link predictors and visualise the results:

assessment <- prune_recover(jazz_collab, 
                            "lp_rnd", 
                            "lp_cn", 
                            "lp_car", 
                            "lp_spm")

plot_lp_performance(assessment, err = "sd")

Session information

sessionInfo()
## R version 3.4.4 (2018-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/openblas-base/libblas.so.3
## LAPACK: /usr/lib/libopenblasp-r0.2.18.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2.2     LinkPrediction_1.0 dplyr_0.7.6       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.17        RSpectra_0.13-1     pillar_1.2.1       
##  [4] compiler_3.4.4      plyr_1.8.4          bindr_0.1.1        
##  [7] tools_3.4.4         digest_0.6.15       evaluate_0.10.1    
## [10] tibble_1.4.2        gtable_0.2.0        lattice_0.20-35    
## [13] pkgconfig_2.0.1     rlang_0.2.1         Matrix_1.2-14      
## [16] igraph_1.2.1        cli_1.0.0           yaml_2.1.18        
## [19] stringr_1.3.0       knitr_1.20          rprojroot_1.3-2    
## [22] grid_3.4.4          tidyselect_0.2.4    glue_1.2.0         
## [25] data.table_1.10.4-3 R6_2.2.2            rmarkdown_1.9      
## [28] purrr_0.2.5         ggplot2_3.0.0       magrittr_1.5       
## [31] backports_1.1.2     scales_0.5.0        htmltools_0.3.6    
## [34] precrec_0.9.1       assertthat_0.2.0    colorspace_1.3-2   
## [37] labeling_0.3        utf8_1.1.3          stringi_1.1.7      
## [40] lazyeval_0.2.1      munsell_0.4.3       crayon_1.3.4


galanisl/LinkPrediction documentation built on May 17, 2019, 12:10 p.m.