add_clusters: Detect and Add Clusters to Graphs

View source: R/add_clusters.R

add_clustersR Documentation

Detect and Add Clusters to Graphs

Description

[Experimental]

This function takes as input a tibble graph (from tidygraph) or a list of tibble graphs, and then runs different cluster detection algorithms depending on the method chosen by the user (see @details for information on the different methods. The function associate each node to its corresponding cluster identifier. It also creates a cluster attribute for edges: to each edge is associated a corresponding cluster identifier if the two nodes connected by the edge belong to the same cluster If nodes have a different cluster, the edge takes "00" as cluster attribute.

Usage

add_clusters(
  graphs,
  weights = NULL,
  clustering_method = c("leiden", "louvain", "fast_greedy", "infomap", "walktrap"),
  objective_function = c("modularity", "CPM"),
  resolution = 1,
  n_iterations = 1000,
  n_groups = NULL,
  node_weights = NULL,
  trials = 10,
  steps = 4,
  verbose = TRUE,
  seed = NA
)

Arguments

graphs

A tibble graph from tidygraph, a list of tibble graphs or a data frame.

weights

The weights of the edges. It must be a positive numeric vector, NULL or NA. If it is NULL and the input graph has a ‘weight’ edge attribute, then that attribute will be used. If NULL and no such attribute is present, then the edges will have equal weights. Set this to NA if the graph was a ‘weight’ edge attribute, but you don't want to use it for community detection. Edge weights are used to calculate weighted edge betweenness. This means that edges are interpreted as distances, not as connection strengths.

clustering_method

The different clustering algorithms implemented in the function (see details). The parameters of the function depend of the clustering method chosen.

objective_function

The objective function to maximize for the leiden algorithm. Whether to use the Constant Potts Model (CPM) or modularity. Must be either "CPM" or "modularity" (see igraph::cluster_leiden()). CPM is used by default.

resolution

The resolution parameter to use for leiden algorithm (see igraph::cluster_leiden()). Higher resolutions lead to more smaller communities, while lower resolutions lead to fewer larger communities.

n_iterations

the number of iterations to iterate the Leiden algorithm. Each iteration may improve the partition further (see igraph::cluster_leiden()).

n_groups

May be used by the fast greedy or the walktrap algorithm. Integer scalar, the desired number of communities. If too low or two high, then an error message is given.

node_weights

May be used both for the Leiden or infomap algorithms. For Leiden, if this is not provided, it will be automatically determined on the basis of the objective_function (see igraph::cluster_leiden()). For infomap, if it is not present, then all vertices are considered to have the same weight. A larger vertex weight means a larger probability that the random surfer jumps to that vertex (see igraph::cluster_infomap()).

trials

The number of attempts to partition the network (can be any integer value equal or larger than 1) for the infomap algorithm (see igraph::cluster_infomap()).

steps

The length of the random walks to perform for the walktrap algorithm (see igraph::cluster_walktrap())

verbose

Set to FALSE if you don't want the function to display different sort of information.

seed

Enter a random number to set the seed within the function. Some algorithms use heuristics and random processes that might result in different cluster each time the function is run. Setting the seed is particularly useful for reproducibility and if you want to make sure to find the same clusters each time the function is run with the same graphs.

Details

The function could be run indifferently on one tidigraph object or on a list of tidygraph object, as created by build_dynamic_networks().

The function implements five different algorithms. Four exists in igraph and are used in this package through their implement in tidygraph (see group_graph()). The function also implements the Leiden algorithm \insertCitetraag2019networkflow which is in igraph but not in tidygraph yet (see cluster_leiden()).

The newly created columns with the cluster identifier for nodes and edges are named depending of the method used. If you use the Leiden algorithm, the function will create a column called cluster_leiden for nodes, and three columns for the edges, called cluster_leiden_from, cluster_leiden_to and cluster_leiden.

The function also automatically calculates the percentage of total nodes that are gathered in each cluster, in the column size_com.

To make plotting easier later, a zero is put before one-digit cluster identifier (cluster 5 becomes "05"; cluster 10 becomes "10"). Attributing a cluster identifier to edges allow for giving edges the same color of the nodes they are connecting together if the two nodes have the same color, or a different color from both nodes, if the nodes belong to different clusters.

Value

The same tidygraph graph or tidygraph list as input, but with a new cluster column for nodes with a column with the size of these clusters, and three cluster columns for edges (see the details).

References

\insertAllCited

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 20,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE)

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden")

temporal_networks[[1]]



agoutsmedt/networkflow documentation built on March 15, 2023, 11:51 p.m.