knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In the following article, we will go other the different steps to create a bibliometric network, manipulate it, prepare the plotting and eventually plot it.
As a point of departure, you need your bibliometric data to be prepared in a certain format:^[See how to extract and clean data from scopus here and from Dimensions here.]
nodes
table. For instance, it may be a list of articles with metadata (author(s), title, journal, etc.). Nodes must have a unique identifier and all the information about a node are gathered on only one row (in case of articles, you need one row per article).directed_edges
table, that is a table that links your nodes with another variable that will be used to build the edges between your nodes. For instance, the table could links articles (your nodes) with the references cited by these articles. It can also be a journal or a list of authors (if you are interested in collaboration). In your directed_edges
table, you need the identifier of the nodes (also present in the nodes
table), and the unique identifier of the categories (references cited, journals, authors...) the nodes are linked to. As soon as you have a nodes and a edges file (see the biblionetwork package for creating such files), you can create a graph, using tidygraph and the tbl_graph() function. The next step, as it is recurrent in many network analyses, notably in bibliometric netwoks like bibliographic coupling networks would be to keep only the main component of your network. This could be done in one step using the tbl_main_component()
function of networkflow
.
library(networkflow) ## basic example code graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) print(graph)
The parameter nb_components
allows you to choose the number of components you want to keep. For obvious reasons, it is settled to 1 by default.
However, it could happen in some networks (for instance co-authorship networks) that the second biggest component of your network is quite large. To avoid removing too big components without knowing it, the tbl_main_component()
function integrates a warning that happens when a secondary component gathering more than x% of the total number of nodes is removed. The threshold_alert
parameter is set to 0.05 by default, but you can reduce it if you really want to avoid removing relatively big components.
## basic example code graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", threshold_alert = 0.001) print(graph)
Once you have you tidygraph graph, an important step is to run community detection algorithms to group the nodes depending on their links. This package uses the leidenAlg package, and its find_partition()
function, to implement the Leiden algorithm [@traag2019]. The leiden_workflow()
function of our package runs the Leiden algorithm and attributes a community number to each node in the Com_ID
column, but also to each edge (depending if the from
and to
nodes are within the same community).
# creating again the graph graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) # finding communities graph <- leiden_workflow(graph) print(graph)
You can observe that the function also gives the size of the community, by calculating the share of total nodes that are in each community.
The function also allows to play with the resolution
parameter of leidenAlg find_partition()
function. Varying the resolution of the algorithm results in a different partition and different number of communities. A lower resolution means less communities, and conversely. The basic resolution of the leiden_workflow()
is set by res_1
and equals 1 by default. You can vary this parameter, but also try a second resolution with res_2
and a third one with res_3
:
# creating again the graph graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) # finding communities graph <- leiden_workflow(graph, res_1 = 0.5, res_2 = 2, res_3 = 3) print(graph)
Once you have detected different communities in your network, you are well on the way of the projection of your graph, but two important steps should be implemented before. First, you have to attribute some colors to each community. These colors will be used for your nodes and edges when you will project your graph with ggraph
. The function community_colors
of the networkflow
package allow to do that. You just have to give it a palette (with as many colors as the number of communities for a better visualisation).^[If two connected nodes are in the same community, their edge will take the same color. If they are in different communities, their edge will have a mix of the two communities colors.]
# loading a palette with many colors palette <- c("#1969B3","#01A5D8","#DA3E61","#3CB95F","#E0AF0C","#E25920","#6C7FC9","#DE9493","#CD242E","#6F4288","#B2EEF8","#7FF6FD","#FDB8D6","#8BF9A9","#FEF34A","#FEC57D","#DAEFFB","#FEE3E1","#FBB2A7","#EFD7F2","#5CAADA","#37D4F5","#F5779B","#62E186","#FBDA28","#FB8F4A","#A4B9EA","#FAC2C0","#EB6466","#AD87BC","#0B3074","#00517C","#871B2A","#1A6029","#7C4B05","#8A260E","#2E3679","#793F3F","#840F14","#401C56","#003C65","#741A09","#602A2A","#34134A","#114A1B","#27DDD1","#27DD8D","#4ADD27","#D3DD27","#DDA427","#DF2935","#DD27BC","#BA27DD","#3227DD","#2761DD","#27DDD1") # creating again the graph graph <- tbl_main_component(nodes = Nodes_coupling, edges = Edges_coupling, directed = FALSE, node_key = "ItemID_Ref", nb_components = 1) # finding communities graph <- leiden_workflow(graph) # attributing colors graph <- community_colors(graph, palette, community_column = "Com_ID") print(graph)
What you want to do next is to give a name automatically to your community. The community_names()
function allows you to do that: it gives to the community the label of the node, within the community, which has the highest score in the statistics you choose. In the next exemple, we will calculate the degree of each node, and each community will take as a name the label of its highest degree node.
library(magrittr) library(dplyr) library(tidygraph) # calculating the degree of nodes graph <- graph %>% activate(nodes) %>% mutate(degree = centrality_degree()) # giving names to communities graph <- community_names(graph, ordering_column = "degree", naming = "Author_date", community_column = "Com_ID") print(graph)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.