Positional analysis groups nodes together who have similar relational characteristics, rather than individual characteristics of nodes themselves. There are many approaches to clustering in social networks based on modularity maximization (e.g, Louvain, SLM, hierarchical clustering) or principles of information theory (e.g, Infomap). ideanet
's role_analysis
function currently offers workflows for two common methods of positional analysis: CONCOR and hierarchical clustering.
To illustrate how to use the role_analysis
function, we'll use a multirelational network of business and marriage relationships between families in Renaissance-era Florence. This network is frequently used to demonstrate role detection methods methods, and is included natively in ideanet
.
library(ideanet)
head(florentine_nodes) head(florentine_edges)
knitr::kable(head(florentine_nodes))
knitr::kable(head(florentine_edges))
The first step in our positional analysis workflow is to process this network using the netwrite
function, as one generally does when using ideanet
to work with sociocentric data:
nw_flor <- netwrite(nodelist = florentine_nodes, node_id = "id", i_elements = florentine_edges$source, j_elements = florentine_edges$target, type = florentine_edges$type, directed = FALSE, net_name = "florentine")
We'll be passing resulting igraph_list
and node_measures
object to the role_analysis
function.
As with all other tools in ideanet
, the role_analysis
function asks users to specify several arguments ahead of execution. Some of these arguments are specific to the positional analysis method being used and are only required when the user selects that method:
General Arguments
graph
: An igraph
object generated by netwrite
. If the network in question is multirelational (as is the one in this example), the object passed to graph
should be the igraph_list
object generated by netwrite
.nodes
: A nodelist data frame generated by netwrite
.directed
: Specify if the edges should be interpreted as directed or undirected. Expects TRUE
or FALSE
logical.method
: Method of role inference. Current valid options are "cluster"
for hierarchical clustering and concor
for CONCOR.min_partitions
: A numeric value indicating the number of minimum number of clusters or partitions to assign to nodes in the network. When using hierarchical clustering, this value reflects the minimum number of clusters produced by analysis. When using CONCOR, this value reflects the minimum number of partitions produced in analysis, such that a value of 1
results in a partitioning of two groups, a value of 2
results in four groups, and so on.max_partitions
: A numeric value indicating the number of maximum number of clusters or partitions to assign to nodes in the network. The value given here is applied in the same way as min_partitions
.min_partition_size
: A numeric value indicating the minimum number of nodes required for inclusion in a cluster. If an inferred cluster or partition contains fewer nodes than the number assigned to min_partition_size
, nodes in this cluster/partition will be labeled as members of a parent cluster/partition.backbone
: A numeric value ranging from 0-1 indicating which edges in the similarity/correlation matrix should be kept when calculating modularity of cluster/partition assignments. When calculating optimal modularity, it helps to backbone the similarity/correlation matrix according to the nth percentile. Larger networks benefit from higher backbone values, while lower values generally benefit smaller networks.viz
: Output summary visualizations. Expects TRUE
or FALSE
logical.Arguments Specific to Hierarchical Clustering
retain_variables
: Output a dataframe of variables used in clustering. Expects TRUE
or FALSE
logical.cluster_summaries
: Output a dataframe containing mean values of clustering variables within each cluster. Expects TRUE
or FALSE
logical.dendro_names
: If viz
is set to TRUE
, a logical value indicating whether the cluster dendrogram visualization produced should display node labels rather than numeric ID numbers.fast_triad
: A logical value indicating whether to use a faster method for counting individual nodes' positions in different types of triads. Set to TRUE
by default. NOTE: This faster method may lead to memory issues and should be avoided when working with larger networks.Arguments Specific to CONCOR
self_ties
: A logical value indicting whether to include self-loops in CONCOR calculation.cutoff
: A numeric value ranging from 0 to 1 that indicates the correlation cutoff for detecting convergence in CONCOR calculation.max_iter
: A numeric value indicating the maximum number of iterations allowed for CONCOR calculation.For our first example, let's look at how to identify role positions using the hierarchical clustering method. Although role_analysis
takes the many arguments listed above, in practice we only need to specify a fraction of them:
flor_cluster <- role_analysis(method = "cluster", graph = nw_flor$igraph_list, nodes = nw_flor$node_measures, directed = FALSE, min_partitions = 2, max_partitions = 7, viz = TRUE, cluster_summaries = TRUE, fast_triad = TRUE)
Note that we've set fast_triad
to be TRUE
here to expedite counting the number of triad positions, or motifs, that each node occupies in the network. This is acceptable for the current network given its small size; however, as stated earlier, setting fast_triad
to TRUE
may lead to memory issues with your computer given too large a network. Should this occur, we recommend setting fast_triad
to FALSE
and trying again.
role_analysis
is similar to netwrite
in that it simultaneously creates several outputs stored in a single list object. In the following section, we'll examine each of the outputs within this list and what they contain.
Depending on the amount of partitioning applied during clustering, individual nodes may vary in terms of cluster membership. Users can inspect cluster membership of individual nodes at each level of partitioning using the cluster_assignments
object:
head(flor_cluster$cluster_assignments)
knitr::kable(head(flor_cluster$cluster_assignments))
Here id
contains each node's simplified identifier as it appears in the node_measures
dataframe produced by netwrite
. Columns beginning with the cut_
prefix indicate a specific level of partitioning. In most cases, we are interested in finding a single solution that best categorizes nodes into different types ("roles") according to their relational characteristics. role_analysis
determines the optimal level of partitioning by taking the distance matrix used in the clustering process and converting it into a similarity matrix. This similarity matrix is then treated as a dense network whose modularity varies according to the membership of nodes within derived clusters. Finally, role_analysis
designates the level of partitioning whose cluster assignments produce the highest modularity score as the best fit. In effect, this converts a multirelational role problem into a single-relation community detection problem in a dense network.
Cluster assignments at this identified optimal level are stored in the max_mod
column, and values in this column are generally those that users will want to use. However, if users require clusters to have a minimum size as specified by the min_partition_size
argument, they will want smaller clusters identified in max_mod
to be subsumed into a parent cluster. When this is the case, the best_fit
column will contain the closest compromise between max_mod
and the user's specifications.
To determine the number of clusters produced at the optimal level of partitioning, you can simply identify the maximum value contained in max_mod
. However, role_analysis
generates two diagnostic visualizations that provide a faster way of interpreting clustering output. The cluster_dendrogram
visualization illustrates the cluster membership of nodes at each level of partitioning while also indicating membership of nodes at the optimal partitioning level:
flor_cluster$cluster_dendrogram
While cluster_dendrogram
shows where nodes fall at each level of partitioning, cluster_modularity
shows how the modularity score of the similarity matrix changes at each level of partitioning:
flor_cluster$cluster_modularity
Note: this plot may not appear in R Markdown documents, but will appear in a plot window if called in the R console.
Looking at this plot and the dendrogram together, we see that nodes in the network have been assigned to one of seven different clusters (including one isolate node; isolates are assigned their own cluster in our approach), and that this partitioning produces the best fit as determined by modularity score. We also see that while most clusters contain about 2-4 nodes, node 8 appears to be unique enough in its relational position to constitute its own cluster.
We now know that nodes in this network fall into one of seven positions or "roles." A proper understanding of these results requires more, however. If clusters are supposed to represent different kinds of roles that nodes occupy in the network, we'll want to know why certain nodes are placed in one cluster over another and how these clusters differ from one another. The cluster_summaries
dataframe provides a numerical overview of differences between inferred clusters, allowing us to make progress to this end.
flor_cluster$cluster_summaries
knitr::kable(flor_cluster$cluster_summaries)
cluster_summaries
provides both crude and standardized averages of the relational measures used to determine cluster membership. These include various measures of network centrality, as well as the frequency with which nodes occupy specific positions in different kinds of triads that appear in the network (motifs). Right away, we see that the single node in cluster 6 differs from its counterparts in other clusters. This node has a considerably higher degree, betweenness, and closeness centrality measures, among others. We also see that our cluster of isolates (cluster 7) appears at the end of this data frame, with all of its values set to NA
given isolates' lack of connection to other nodes in the network.
While recognized here, these differences are also visualized in the cluster_summaries_cent
object. Because the network examined here is multirelational, cluster_summaries_cent
plots these differences for each unique relationship type in the network, as well as for the overall network:
flor_cluster$cluster_summaries_cent$marriage
flor_cluster$cluster_summaries_cent$business
flor_cluster$cluster_summaries_cent$summary_graph
Those familiar with positions and motifs in networks know that as many as 36 types of positions can exist in a network, which can be unwieldy to inspect alongside other measures. Consequently, differences in triad positions are visualized separately in cluster_summaries_triad
:
flor_cluster$cluster_summaries_triad$marriage
flor_cluster$cluster_summaries_triad$business
flor_cluster$cluster_summaries_triad$summary_graph
Overall, the node in cluster 6 tends to have the highest values on most measures used to identify roles in the network. Those familiar with the substantive setting of this network will not be surprised to learn that this node represents the Medici family, which was known for its power and influence in Renaissance Florence. Additionally, nodes in cluster 2 tend to appear in more clustered parts of this network due to their business ties. If one is curious to see where the Medici and families in other role positions appear relative to one another in the network, one can quickly take the information contained in cluster_assignments
and assign it as a node-level attribute in an igraph
object for visualization:
igraph::V(nw_flor$florentine)$role <- flor_cluster$cluster_assignments$best_fit plot(nw_flor$florentine, vertex.color = as.factor(igraph::V(nw_flor$florentine)$role), vertex.label = igraph::V(nw_flor$florentine)$family)
A final point of consideration in positional analysis involves knowing whether nodes in a particular role tend to form ties among themselves or with nodes in other roles. When using hierarchical clustering, role_analysis
generates a series of heatmaps, contained in a list, to visualize the frequency of tie formation within and between clusters. Each heatmap measures connections across clusters using different measures, and the names of these measures are used to extract their corresponding plot from the list:
flor_cluster$cluster_relations_heatmaps$chisq # Chi-squared flor_cluster$cluster_relations_heatmaps$density # Density flor_cluster$cluster_relations_heatmaps$density_std # Density (Standardized) flor_cluster$cluster_relations_heatmaps$density_centered # Density (Zero-floored)
Looking at the density-based heatmaps here, one finds a high level of connection between the Medici family and families belonging to cluster 4. One can also see that families in cluster 2 have a high propensity to be tied to families in cluster 5.
Alongside hierarchical clustering, the CONvergence of iterated CORrelations (CONCOR) algorithm is a popular method for conducting positional analysis in networks. Those wishing to use this algorithm instead of hierarchical clustering can easily do so using the role_analysis
function. As stated before, setup for using CONCOR is similar to that for using hierarchical clustering, with users only having to specify a few different arguments:
flor_concor <- role_analysis(method = "concor", graph = nw_flor$igraph_list, nodes = nw_flor$node_measures, directed = FALSE, min_partitions = 1, max_partitions = 4, viz = TRUE)
Using CONCOR in role_analysis
produces fewer outputs, but those that are produced resemble select items produced using hierarchical clustering. concor_assignments
, for example, appends "block" assignments to the end of the node_measures
data frame that the user feeds into the role_analysis
function:
flor_concor$concor_assignments %>% dplyr::select(id, family, dplyr::starts_with("block"), best_fit)
knitr::kable(flor_concor$concor_assignments %>% dplyr::select(id, family, dplyr::starts_with("block"), best_fit))
As with the hierarchical clustering method, the optimal level of partitioning for CONCOR is determined according to the maximization of modularity in a similarity matrix. One can inspect how modularity changes at different levels of partitioning using the concor_modularity
visualization:
flor_concor$concor_modularity
Visualizing CONCOR assignments in a conventional network visualization entails a similar process to that used for hierarchical clustering.
igraph::V(nw_flor$florentine)$concor <- flor_concor$concor_assignments$best_fit plot(nw_flor$florentine, vertex.color = as.factor(igraph::V(nw_flor$florentine)$concor), vertex.label = NA)
In lieu of a dendrogram, users can see how smaller partitions branch off of larger parents with the concor_block_tree
visualization. Like cluster_dendrogram
, this visualization allows users to quickly gauge the relative size of blocks inferred by CONCOR:
flor_concor$concor_block_tree
Finally, users can also assess the level of connection across CONCOR blocks using the concor_relations_heatmaps
object:
flor_concor$concor_relations_heatmaps$chisq flor_concor$concor_relations_heatmaps$density flor_concor$concor_relations_heatmaps$density_std flor_concor$concor_relations_heatmaps$density_centered
On the whole, using CONCOR tells us that nodes in the Florentine network fall into one of only two blocks (plus a third block for our isolate), and that nodes within these roles tend to interact among themselves rather than with nodes in the other block. These simpler results are less informative than those produced by the hierarchical clustering method. But this is not to say that CONCOR is an inferior approach to positional analysis. Interpreting results from positional analysis often entails more subjectivity than other network analysis methods. Although two partitions may maximize modularity, users may find that a higher level of partitioning produces blocks with important substantive differences. Were we to accept four blocks as a more appropriate fit than two, we see our inferred blocks start to resemble the groups we inferred using hierarchical clustering. Moreover, this resemblance also comes with only a small drop in modularity:
igraph::V(nw_flor$florentine)$concor2 <- flor_concor$concor_assignments$block_2 plot(nw_flor$florentine, vertex.color = as.factor(igraph::V(nw_flor$florentine)$concor2), vertex.label = NA)
With this in mind, we encourage users to thoroughly consider how they treat their data when using role_analysis
and to use their best judgment when interpreting its output.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.