title: 'netrankr: An R package for total, partial, and probabilistic rankings in networks' tags: - R - network analysis - network centrality - partial orders date: "10 August 2022" output: pdf_document bibliography: paper.bib affiliations: - name: GESIS - Leibniz Institute for the Social Sciences index: 1 authors: - name: David Schoch orcid: 0000-0003-2952-4812 affiliation: 1
One of the key concepts in network science is network centrality. Centrality
seeks to provide the answer to the question of who (or what) is important in
a network depending on the underlying process forming the network and the
empirical phenomenon in question. In a nutshell, an actor in a network is more
central if they have better relations, where the definition of better relations
depends on the conceptualization of structural importance. Applications of centrality can be
found in any field where networks arise. In social networks, we may simply be interested in finding
the most popular user. In bioinformatics, centrality is used to detect essential proteins in
protein-protein interaction networks [@jmbo-lcpn-01].
Even in sports, centrality is applied to rank athletes or teams [@r-wbpecnahpt-11].
A myriad of indices have been proposed, all with differing interpretations of what
constitutes a central position within a network. Although netrankr
offers this
traditional approach to network centrality, its main focus lies on alternative assessments
of centrality based on partial and probabilistic rankings in networks.
General purpose packages for network analysis such as igraph
[@cn-ispcnr-06] and
sna
[@b-snas-08] implement methods for the most frequently used centrality indices.
However, there also exist dedicated centrality packages, such as
centiserve
[@jsaayga-ccrwarpca-15], CINNA
[@amj-crpdcinna-19], and influenceR
[@sa-istqsinn-15].
The biggest in terms of implemented indices is centiserve
which includes $33$ indices.
The primary purpose of CINNA
is to facilitate the choice of indices by visual
and statistical tools. influenceR
is a relatively tiny package which implements
only a few specialized measures.
The netrankr
package also offers
the opportunity to apply more than $30$ indices, but the main focus lies on alternative assessment methods.
Recent results suggest that a variety of partial rankings exists in networks which are preserved by many index based centrality rankings [@sb-sninc-15; @sb-rcsn-16; @svb-ccicurg-17].
These partial rankings can be leveraged to assess centrality on a more general level, without necessarily resorting to indices.
Some of these methods and key functionalities of the package are listed below.
Note that the most of the tools can also be applied in other empirical settings where partial orders arise.
In an undirected graph $G=(V,E)$ with vertex set $V$ (with cardinality $n = \lvert V\rvert$) and edge set $E$ (with cardinality $m = \lvert E\rvert$), the neighborhood of a node $u \in V$ is defined as $$N(u)=\lbrace w : \lbrace u,w \rbrace \in E \rbrace$$ and its closed neighborhood as $N[v]=N(v) \cup \lbrace v \rbrace$. If the neighborhood of the node $u$ is a subset of the closed neighborhood of the node $v$, $N(u)\subseteq N[v]$, than it is called a neighborhood inclusion. This concept defines a binary relation among nodes in a network and consequently $u$ is dominated by $v$ if $N(u)\subseteq N[v]$. Neighborhood-inclusion thus induces a partial ranking on the vertices of a network.
@sb-rcsn-16 showed that if $c:V \to \mathbb{R}$ is a centrality index, then $$N(u)\subseteq N[v] \implies c(u) \leq c(v)$$ that is, neighborhood inclusion is preserved by indices and the ranking can be viewed as linear extensions of the partial ranking induced by neighborhood-inclusion. Analyzing this partial ranking thus means that all possible centrality rankings can be analyzed at once. More technical details can be found in the dedicated literature [@sb-sninc-15; @b-np-16;@svb-ccicurg-17;@b-cpsn-20].
This example briefly explains some of the functionality of the package and the difference to an index driven approach. For more detailed applications see the package vignettes.
We work with a small graph included in the package which was specifically crafted to highlight extreme differences in centrality rankings (see Figure \ref{fig:dbces11}).
library(igraph)
library(netrankr)
data("dbces11")
g <- dbces11
# Code to reproduce Figure 1
library(ggraph)
ggraph(g, "stress") +
geom_edge_link0(edge_color = "grey66") +
geom_node_point(shape = 21, fill = "grey25", size = 8) +
geom_node_text(label = 1:11, color = "white")
\begin{figure} \centering \includegraphics[width=0.7\textwidth]{figures/dbces.pdf} \caption{Toy graph included in netrankr used to illustrate the functionality of the package.} \label{fig:dbces11} \end{figure}
Say we are interested in the most central node of the graph and simply
compute some standard centrality scores with the igraph
package.
cent_scores <- data.frame(
degree = degree(g),
betweenness = round(betweenness(g), 4),
closeness = round(closeness(g), 4),
eigenvector = round(eigen_centrality(g)$vector, 4),
subgraph = round(subgraph_centrality(g), 4))
# What are the most central nodes for each index?
apply(cent_scores, 2, which.max)
#> degree betweenness closeness eigenvector subgraph
#> 11 8 6 7 10
Each index assigns the highest value to a different vertex and it is not clear which one is the correct choice.
A more general assessment using netrankr
starts by calculating the neighborhood inclusion preorder.
P <- neighborhood_inclusion(g)
P
#> 1 2 3 4 5 6 7 8 9 10 11
#> 1 0 0 1 0 1 1 1 0 0 0 1
#> 2 0 0 0 1 0 0 0 1 0 0 0
#> 3 0 0 0 0 1 0 0 0 0 0 1
#> 4 0 0 0 0 0 0 0 0 0 0 0
#> 5 0 0 0 0 0 0 0 0 0 0 0
#> 6 0 0 0 0 0 0 0 0 0 0 0
#> 7 0 0 0 0 0 0 0 0 0 0 0
#> 8 0 0 0 0 0 0 0 0 0 0 0
#> 9 0 0 0 0 0 0 0 0 0 0 0
#> 10 0 0 0 0 0 0 0 0 0 0 0
#> 11 0 0 0 0 0 0 0 0 0 0 0
P[u,v]=1
if $N(u)\subseteq N[v]$. Hence, $u$ is always ranked below $v$.
If P[u,v]=0
, then there may be indices that rank $u$ above $v$ and vice versa.
We can examine the minimal and maximal possible rank of each node for rankings that extend the preorder to a total order using rank intervals. The bigger the intervals are, the more freedom exists for indices to rank nodes differently (see Figure \ref{fig:rk_int}).
plot(rank_intervals(P), cent_scores = cent_scores,ties.method = "average")
\begin{figure} \centering \includegraphics[width=0.7\textwidth]{figures/rk_intervals.pdf} \caption{Rank intervals for the toy graph showing the range of potential centrality ranks including the rankings of five indices.} \label{fig:rk_int} \end{figure}
Note that the ranks of nodes are not uniformly distributed in the
intervals. The exact probabilities, can be obtained with
exact_rank_prob()
.
res <- exact_rank_prob(P)
res$rank.prob
contains the probabilities for each node to occupy a certain
rank. For instance, the probability for each node to be the most central
one is as follows:
round(res$rank.prob[,11], 2)
#> 1 2 3 4 5 6 7 8 9 10 11
#> 0.00 0.00 0.00 0.14 0.16 0.11 0.11 0.14 0.09 0.09 0.16
The entry
res$relative.rank[u, v]
indicates how likely it is that v
is more central
than u
.
# How likely is it, that 6 is more central than 3?
round(res$relative.rank[3, 6], 2)
#> [1] 0.75
res$expected.ranks
contains the expected centrality ranks for all nodes.
round(res$expected.rank, 2)
#> 1 2 3 4 5 6 7 8 9 10 11
#> 1.71 3.00 4.29 7.50 8.14 6.86 6.86 7.50 6.00 6.00 8.14
The higher the value, the more central a node is expected to be.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.