merge_dynamic_clusters: Find Similar Clusters across Multiple Temporal Networks
In agoutsmedt/networkflow: Functions For A Workflow To Manipulate Networks

merge_dynamic_clusters

R Documentation

Find Similar Clusters across Multiple Temporal Networks

Description

This function creates a new column "intertemporal_name" for each network from a list of temporal networks to identify similar clusters across time. The function gives the same name to two clusters from two succesive temporal networks if they match the conditions defined by the user: threshold_similarity, cluster_colum and similarity_type.

Usage

merge_dynamic_clusters(
  list_graph,
  cluster_id,
  node_id,
  threshold_similarity = 0.5001,
  similarity_type = c("complete", "partial")
)

Arguments

`list_graph`	A list of tibble graphs ((from tidygraph)) The list is expected to be ordered in a sequential order from the oldest to the most recent network.
`cluster_id`	The column with the identifier of the cluster. If you have used add_clusters(), it is of the form `⁠cluster_{clustering_method}⁠`.
`node_id`	The column with the unique identifier of each node.
`threshold_similarity`	The threshold_similarity variable defines how sensitive the function is to giving the same name to two clusters. A higher threshold will lead to more communities. For example, if you have two temporal networks with two communities each. Communities A and B for the older network, and communities A' and B' for the more recent network. A threshold of 0.51 with a "complete" similarity_type means that community A' will be given the name A if 51% of the nodes from A' in the more recent network originate from A in the older network, and 51% of the node from A in the older network becomes in A' in the more recent network.
`similarity_type`	Choose a similarity type to compare the threshold to: "complete" similarity compute the share of nodes going from an older community to a more recent community on all the nodes in both networks "partial" similarity compute the share of nodes going from an older community to a more recent community only on nodes that exists in both networks Complete similarity is particularly suited if the number of nodes in your networks is relatively stable over time as the threshold capture the share of all nodes moving between clusters. Partial similarity can be particularly useful when the number of nodes in your networks increases rapidly. The interpretation of the threshold is that it captures the share of nodes existing in both networks moving between clusters. For example, with a complete similarity threshold of 0.51, if (1) all nodes from community A in network t-1 go into community A' in network t+1, and (2) all nodes in community A' present in network t-1 originate from community A, but (3) the number of nodes in A' is more than twice of A because of new nodes that did not exists in t-1, A' will never meet the threshold requirement to be named A despite a strong similarity between the two clusters. Conceptually, this might be a desired behavior of the function because one might considered that A' is too different from A to be considered the same cluster as its composition is changed from new nodes. In that case complete similarity is the right choice. However, if one consider that A and A' are very similar because all the nodes that exists in both networks are identified as part of the same community, then partial threshold similarity is more desirable.

Value

The function returns the same list of networks used as input in list_graph but with a new column ⁠dynamic_{cluster_id}⁠ (i.e, the name of the new column depends of the column that served as input). The column is the result of the inter-graphs grouping of the original clusters of the cluster_id. The dynamic clusters are also merged with the different cluster_id columns of the edges data.

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 10,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE)

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden")

temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")

temporal_networks[[1]]

agoutsmedt/networkflow documentation built on July 3, 2025, 8:54 p.m.