merge_dynamic_clusters: Find Similar Clusters across Multiple Temporal Networks

View source: R/merge_dynamic_clusters.R

merge_dynamic_clustersR Documentation

Find Similar Clusters across Multiple Temporal Networks

Description

[Experimental]

This function creates a new column "intertemporal_name" for each network from a list of temporal networks to identify similar clusters across time. The function gives the same name to two clusters from two succesive temporal networks if they match the conditions defined by the user: threshold_similarity, cluster_colum and similarity_type.

Usage

merge_dynamic_clusters(
  list_graph = NA,
  cluster_id = NA,
  node_id = NA,
  threshold_similarity = 0.5001,
  similarity_type = c("complete, partial")
)

Arguments

list_graph

A list of tibble graphs ((from tidygraph)) The list is expected to be ordered in a sequential order from the oldest to the most recent network.

cluster_id

The column with the identifier of the cluster. If you have used add_clusters(), it is of the form cluster_{clustering_method}.

node_id

The column with the unique identifier of each node.

threshold_similarity

The threshold_similarity variable defines how sensitive the function is to giving the same name to two clusters. A higher threshold will lead to more communities.

For example, if you have two temporal networks with two communities each. Communities A and B for the older network, and communities A' and B' for the more recent network. A threshold of 0.51 with a "complete" similarity_type means that community A' will be given the name A if 51% of the nodes from A' in the more recent network originate from A in the older network, and 51% of the node from A in the older network becomes in A' in the more recent network.

similarity_type

Choose a similarity type to compare the threshold to:

  • "complete" similarity compute the share of nodes going from an older community to a more recent community on all the nodes in both networks

  • "partial" similarity compute the share of nodes going from an older community to a more recent community only on nodes that exists in both networks

Complete similarity is particularly suited if the number of nodes in your networks is relatively stable over time as the threshold capture the share of all nodes moving between clusters. Partial similarity can be particularly useful when the number of nodes in your networks increases rapidly. The interpretation of the threshold is that it captures the share of nodes existing in both networks moving between clusters.

For example, with a complete similarity threshold of 0.51, if (1) all nodes from community A in network t-1 go into community A' in network t+1, and (2) all nodes in community A' present in network t-1 originate from community A, but (3) the number of nodes in A' is more than twice of A because of new nodes that did not exists in t-1, A' will never meet the threshold requirement to be named A despite a strong similarity between the two clusters. Conceptually, this might be a desired behavior of the function because one might considered that A' is too different from A to be considered the same cluster as its composition is changed from new nodes. In that case complete similarity is the right choice. However, if one consider that A and A' are very similar because all the nodes that exists in both networks are identified as part of the same community, then partial threshold similarity is more desirable.

Value

The function returns the same list of networks used as input in list_graph but with a new column dynamic_{cluster_id} (i.e, the name of the new column depends of the column that served as input). The column is the result of the inter-graphs grouping of the original clusters of the cluster_id. The dynamic clusters are also merged with the different cluster_id columns of the edges data.

Examples

library(networkflow)

nodes <- Nodes_stagflation |>
dplyr::rename(ID_Art = ItemID_Ref) |>
dplyr::filter(Type == "Stagflation")

references <- Ref_stagflation |>
dplyr::rename(ID_Art = Citing_ItemID_Ref)

temporal_networks <- build_dynamic_networks(nodes = nodes,
directed_edges = references,
source_id = "ID_Art",
target_id = "ItemID_Ref",
time_variable = "Year",
cooccurrence_method = "coupling_similarity",
time_window = 10,
edges_threshold = 1,
overlapping_window = TRUE,
filter_components = TRUE)

temporal_networks <- add_clusters(temporal_networks,
objective_function = "modularity",
clustering_method = "leiden")

temporal_networks <- merge_dynamic_clusters(temporal_networks,
cluster_id = "cluster_leiden",
node_id = "ID_Art",
threshold_similarity = 0.51,
similarity_type = "partial")

temporal_networks[[1]]


agoutsmedt/networkflow documentation built on March 15, 2023, 11:51 p.m.