Position and Equivalence

library(learnr)
library(manynet)
library(patchwork)
knitr::opts_chunk$set(echo = FALSE)
clear_glossary()
friends <- to_uniplex(ison_algebra, "friends")
social <- to_uniplex(ison_algebra, "social")
tasks <- to_uniplex(ison_algebra, "tasks")

Setting up

gif of rick and morty characters hanging out with themselves

For this session, we're going to use the "ison_algebra" dataset included in the {manynet} package. Do you remember how to call the data? Can you find out some more information about it via its help file?


# Let's call and load the 'ison_algebra' dataset
data("ison_algebra", package = "manynet")
# Or you can retrieve like this:
ison_algebra <- manynet::ison_algebra
# If you want to learn more about the 'ison_algebra' dataset, use the following function (below)
?manynet::ison_algebra
data("ison_algebra", package = "manynet")
?manynet::ison_algebra
# If you want to see the network object, you can run the name of the object
# ison_algebra
# or print the code with brackets at the front and end of the code
# (ison_algebra <- manynet::ison_algebra)

We can see that the dataset is multiplex, meaning that it contains several different types of ties: friendship (friends), social (social) and task interactions (tasks).

Separating multiplex networks

As a multiplex network, there are actually three different types of ties in this network. We can extract them and investigate them separately using to_uniplex(). Within the parentheses, put the multiplex object's name, and then as a second argument put the name of the tie attribute in quotation marks. Once you have extracted all three networks, graph them and add a descriptive title.


# Here's the basic idea/code syntax you will need to extract each type of network
# You will want to replace
____ <- to_uniplex(ison_algebra, _____)
graphr(friends) + ggtitle("Friendship")
friends <- to_uniplex(ison_algebra, "friends")
graphr(friends) + ggtitle("Friendship")

social <- to_uniplex(ison_algebra, "social")
graphr(social) + ggtitle("Social")

tasks <- to_uniplex(ison_algebra, "tasks")
graphr(tasks) + ggtitle("Task")

Note also that these are weighted networks. graphr() automatically recognises these different weights and plots them.

question("If we interpret ties with higher weights as strong ties, and lesser weights as weak ties, then, according to network theory, where would we expect novel information to come from?",
         answer("Weak ties",
                correct = TRUE,
                message = learnr::random_praise()),
         answer("Strong ties",
                message = learnr::random_encouragement()),
         answer("Isolates",
                message = learnr::random_encouragement()),
         answer("Highest degree nodes",
                message = learnr::random_encouragement()),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Structural Holes

Our first question for this network, is where innovation and creative ideas might be expected to appear. There are a number of theories that associate innovation or novelty with structural position.

question("Which network concepts are associated with innovation?",
         answer("Structural holes",
                correct = TRUE,
                message = "Being positioned in a structural hole is said to be a brokerage position that brings with it information arbitrage possibilities."),
         answer("Structural folds",
                correct = TRUE,
                message = "Being positioned in a structural fold, with in-group membership in multiple groups, can provide not only information arbitrage possibilities, but also the standing in the target group to introduce novelty successfully."),
         answer("Structural balance",
                message = learnr::random_encouragement()),
         answer("Structural equivalence",
                message = '<img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExdmY4ejhpMnZoZmh3OGUxbTdjbXQwMTcwcmg0ZHo5dmJ6NzI3c3FxNyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9dg/2E7VU4zGVm8H1mFSAz/giphy.gif" alt="gif of a morty assassinating another morty"/>'),
         answer("Structuralism",
                message = learnr::random_encouragement()),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Measuring structural holes

question("There are a number of measures that might be used to approximate the concept of structural holes. Select all that apply.",
         answer("Constraint",
                correct = TRUE,
                message = learnr::random_praise()),
         answer("Effective size",
                correct = TRUE),
         answer("Bridges",
                correct = TRUE),
         answer("Redundancy",
                correct = TRUE),
         answer("Efficiency",
                correct = TRUE),
         answer("Hierarchy",
                correct = TRUE),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Bridges

One common way of thinking about how innovations and ideas might flow across the network is to think about where the bottlenecks are. Ties that are the bottlenecks for information to flow from one part of the network to another are called bridges. Remember, this means that only ties can be bridges, though a node-based measure that tracks the count of bridges to which a node is adjacent is available. Unfortunately, if we take a closer look at the friends network, there are no bridges.

sum(tie_is_bridge(friends))
any(node_bridges(friends)>0)

Constraint

But some nodes do seem more deeply embedded in the network than others, Let's take a look at which actors are least constrained by their position in the task network. {manynet} makes this easy enough with the node_constraint() function.

alge <- to_named(ison_algebra)
friends <- to_uniplex(alge, "friends")
social <- to_uniplex(alge, "social")
tasks <- to_uniplex(alge, "tasks")

node_constraint(____)
# Don't forget we want to look at which actors are least constrained by their position in the 'tasks' network
node_constraint(tasks)

This function returns a vector of constraint scores that can range between 0 and 1. Let's graph the network again, sizing the nodes according to this score. We can also identify the node with the minimum constraint score using node_is_min().


tasks <- tasks %>% 
  mutate(constraint = node_constraint(____),
         low_constraint = node_is_min(node_constraint(____)))

# Don't forget, we are still looking at the 'tasks' network
# Now, let's graph the network
# Note 1: we are looking at the 'tasks' network
# Note 2: we are interested in the actors 'least constrained' by their position

graphr(____, node_color = "____")
graphr(tasks, node_size = "constraint", node_color = "low_constraint")
tasks <- tasks %>% 
  mutate(constraint = node_constraint(tasks), 
         low_constraint = node_is_min(node_constraint(tasks)))
graphr(tasks, node_size = "constraint", node_color = "low_constraint")

Why minimum? Because constraint measures how well connected each node's partners are, with the implication that having few partners that are already connected to each other puts a node in an advantageous position to identify and share novel solutions to problems. So what can we learn from this plot about where innovation might occur within this network?

Structural Equivalence

gif of rick multiplying

Next we might ask ourselves what (other) roles there are in the network? We want to know who plays what role in this algebra class. Let us begin with structural equivalence.

question("Structural equivalence means identifying classes of nodes with...",
         answer("same/similar tie partners.",
                correct = TRUE,
                message = learnr::random_praise()),
         answer("same/similar pattern of ties.",
                message = "This is the definition for regular equivalence."),
         answer("same/similar distance from all others.",
                message = "This is the definition for automorphic equivalence.")
)

We're going to identify structurally equivalent positions across all the data that we have, including 'task', 'social', and 'friend' ties. So that is, we are using the multiplex ison_algebra dataset again and not a uniplex subgraph thereof.

Finding structurally equivalent classes

In {manynet}, finding how the nodes of a network can be partitioned into structurally equivalent classes can be as easy as:

node_in_structural(ison_algebra)

ison_algebra %>% 
  mutate(se = node_in_structural(ison_algebra)) %>% 
  graphr(node_color = "se")

But actually, a lot is going on behind the scenes here that we can unpack. Understanding what is going on behind the scenes is important for understanding how these classes are identified and how to interpret them.

Step one: starting with a census

All equivalence classes are based on nodes' similarity across some profile of motifs. In {manynet}, we call these motif censuses. Any kind of census can be used, and {manynet} includes a few options, but node_in_structural() is based off of the census of all the nodes' ties, both outgoing and incoming ties, to characterise their relationships to tie partners.


# Let's use the node_by_tie() function
# The function accepts an object such as a dataset
# Hint: Which dataset are we using in this tutorial?
node_by_tie(____)
node_by_tie(ison_algebra)
# Now, let's get the dimensions of an object via the dim() function
dim(node_by_tie(ison_algebra))
node_by_tie(ison_algebra)
dim(node_by_tie(ison_algebra))

We can see that the result is a matrix of 16 rows and 96 columns, because we want to catalogue or take a census of all the different incoming/outgoing partners our 16 nodes might have across these three networks. Note also that the result is a weighted matrix; what would you do if you wanted it to be binary?


# we could convert the result using as.matrix, returning the ties 
as.matrix((node_by_tie(ison_algebra)>0)+0)
# But it's easier to simplify the network by removing the classification into different types of ties.
# Note that this also reduces the total number of possible paths between nodes
ison_algebra %>%
  select_ties(-type) %>%
  node_by_tie()

Note that node_by_tie() does not need to be passed to node_in_structural() --- this is done automatically! However, the more generic node_in_equivalence() is available and can be used with whichever tie census is desired. Feel free to explore using some of the other censuses available in {manynet}, though some common ones are already used in the other equivalence convenience functions, e.g. node_by_triad() in node_in_regular() and node_by_path() in node_in_automorphic().

Step two: growing a tree of similarity

The next part takes this census and creates a dendrogram based on distance or dissimilarity among the nodes' census profiles. This is all done internally within e.g. node_in_structural(), though there are two important parameters that can be set to obtain different results.

First, users can set the type of distance measure used. For enthusiasts, this is passed on to stats::dist(), so that help page should be consulted for more details. By default "euclidean" is used.

Second, we can also set the type of clustering algorithm employed. By default, {manynet}'s equivalence functions use hierarchical clustering, "hier", but for compatibility and enthusiasts, we also offer "concor", which implements a CONCOR (CONvergence of CORrelations) algorithm.

We can see the difference from varying the clustering algorithm and/or distance by plotting the dendrograms (hidden) in the output from node_in_structural():

alge <- to_named(ison_algebra) # fake names to make comparison clearer
plot(node_in_structural(alge, cluster = "hier", distance = "euclidean"))

# changing the type of distance used
plot(node_in_structural(alge, cluster = "hier", distance = "manhattan"))

# changing the clustering algorithm
plot(node_in_structural(alge, cluster = "concor", distance = "euclidean"))
question("Do you see any differences?",
         answer("Yes", correct = TRUE, message = learnr::random_praise()),
         answer("No"),
        allow_retry = TRUE)

So plotting a membership vector from {manynet} returns a dendrogram with the names of the nodes on the y-axis and the distance between them on the x-axis. Using the census as material, the distances between the nodes is used to create a dendrogram of (dis)similarity among the nodes. Basically, as we move to the right, we're allowing for more and more dissimilarity among those we cluster together. A fork or branching point indicates the level of dissimilarity at which those two or more nodes would be said to be equivalent. Where two nodes' branches join/fork represents the maximum distance among all their leaves, so more similar nodes' branches fork closer to the tree's canopy, and less similar (groups of) nodes don't join until they form basically the trunk.

Note that with the results using the hierarchical clustering algorithm, the distance directly affects the structure of the tree (and the results).

The CONCOR dendrogram operates a bit differently to hierarchical clustering though. Instead it represents how converging correlations repeatedly bifurcate the nodes into one of two partitions. As such the 'distance' is really just the (inverse) number of steps of bifurcations until nodes belong to the same class.

Step three: identifying the number of clusters

Another bit of information represented in the dendrogram is where the tree should be cut (the dashed red line) and how the nodes are assigned to the branches (clusters) present at that cut-point.

But where does this red line come from? Or, more technically, how do we identify the number of clusters into which to assign nodes?

{manynet} includes several different ways of establishing k, or the number of clusters. Remember, the further to the right the red line is (the lower on the tree the cut point is) the more dissimilar we're allowing nodes in the same cluster to be. We could set this ourselves by just passing k an integer.

plot(node_in_structural(alge, k = 2))

But we're really just guessing. Maybe 2 is not the best k? To establish what the best k is for this clustering exercise, we need to iterate through a number of potential k and consider their fitness by some metric. There are a couple of options here.

One is to consider, for each k, how correlated this partition is with the observed network. When there is one cluster for each vertex in the network, cell values will be identical to the observed correlation matrix, and when there is one cluster for the whole network, the values will all be equal to the average correlation across the observed matrix. So the correlations in each by-cluster matrix are correlated with the observed correlation matrix to see how well each by-cluster matrix fits the data.

Of course, the perfect partition would then be where all nodes are in their own cluster, which is hardly 'clustering' at all. Also, increasing k will always improve the correlation. But if one were to plot these correlations as a line graph, then we might expect there to be a relatively rapid increase in correlation as we move from, for example, 3 clusters to 4 clusters, but a relatively small increase from, for example, 13 clusters to 14 clusters. By identifying the inflection point in this line graph, {manynet} selects a number of clusters that represents a trade-off between fit and parsimony. This is the k = "elbow" method.

The other option is to evaluate a candidate for k based not on correlation but on a metric of how similar each node in a cluster is to others in its cluster and how dissimilar each node is to those in a neighbouring cluster. When averaged over all nodes and all clusters, this provides a 'silhouette coefficient' for a candidate of k. Choosing the number of clusters that maximizes this coefficient, which is what k = "silhouette" does, can return a somewhat different result to the elbow method. See what we have here, with all other arguments held the same:

plot(node_in_structural(alge, k = "elbow"))
plot(node_in_structural(alge, k = "silhouette"))

Ok, so it looks like the elbow method returns k == 3 as a good trade-off between fit and parsimony. The silhouette method, by contrast, sees k == 4 as maximising cluster similarity and dissimilarity. Either is probably fine here, and there is much debate around how to select the number of clusters anyway. However, the silhouette method seems to do a better job of identifying how unique the 16th node is. The silhouette method is also the default in {manynet}.

Note that there is a somewhat hidden parameter here, range. Since testing across all possible numbers of clusters can get computationally expensive (not to mention uninterpretable) for large networks, {manynet} only considers up to 8 clusters by default. This however can be modified to be higher or lower, e.g. range = 16.

Finally, one last option is k = "strict", which only assigns nodes to the same partition if there is zero distance between them. This is quick and rigorous solution, however oftentimes this misses the point in finding clusters of nodes that, despite some variation, can be considered as similar on some dimension.

plot(node_in_structural(alge, k = "strict"))

Here for example, no two nodes have precisely the same tie-profile, otherwise their branches would join/fork at a distance of 0. As such, k = "strict" partitions the network into 16 clusters. Where networks have a number of nodes with strictly the same profiles, such a k-selection method might be helpful to recognise nodes in exactly the same structural position, but here it essentially just reports nodes' identity.

Blockmodelling

gif of mortys in a block parade

Summarising profiles

Ok, so now we have a result from establishing nodes' membership in structurally equivalent classes. We can graph this of course, as above:

alge %>% 
  mutate(se = node_in_structural(alge)) %>% 
  graphr(node_color = "se")

While this plot adds the structurally equivalent classes information to our earlier graph, it doesn't really help us understand how the classes relate. That is, we might be less interested in how the individuals in the different classes relate, and more interested in how the different classes relate in aggregate.

gif of mortys learning about roles

One option that can be useful for characterising what the profile of ties (partners) is for each position/equivalence class is to use summary(). It summarises some census result by a partition (equivalence/membership) assignment. By default it takes the average of ties (values), but this can be tweaked by assigning some other summary statistic as FUN =.


# Let's wrap node_by_tie inside the summary() function
# and pass it a membership result
summary(node_by_tie(____),
        membership = ____)
summary(node_by_tie(alge),
        membership = node_in_structural(alge))

This node census produces 96 columns, $16 \text{nodes} * 2 \text{directions} * 3 \text{edge types}$, it takes a bit to look through what varies between the different classes as 'blocked'. But only four rows (the four structurally equivalent classes, according to the default).

Another way to do this is to plot the gloss("blockmodel") as a whole. Passing the plot() function an adjacency/incidence matrix along with a membership vector allows the matrix to be sorted and framed (without the membership vector, just the adjacency/incidence matrix is plotted):


# Let's plot the blockmodel using the plot() function we used for the dendrograms
# Instead of node_tie_census() let's us as_matrix()

plot(as_matrix(____),
     membership = ____)
# plot the blockmodel for the whole network
plot(as_matrix(alge),
     membership = node_in_structural(alge))

# plot the blockmodel for the friends, tasks, and social networks separately
plot(as_matrix(to_uniplex(alge, "friends")),
     membership = node_in_structural(alge))
plot(as_matrix(to_uniplex(alge, "tasks")),
     membership = node_in_structural(alge))
plot(as_matrix(to_uniplex(alge, "social")),
     membership = node_in_structural(alge))

By passing the membership argument our structural equivalence results, the matrix is re-sorted to cluster or 'block' nodes from the same class together. This can help us interpret the general relationships between classes. For example, when we plot the friends, tasks, and social networks using the structural equivalence results, we might characterise them like so:

Reduced graphs

Lastly, we can consider how classes of nodes relate to one another in a blockmodel. Let's use the 4-cluster solution on the valued network (though binary is possible too) to create a r gloss("reduced graph","reduced"). A reduced graph is a transformation of a network such that the nodes are no longer the individual nodes but the groups of one or more nodes as a class, and the ties between these blocked nodes can represent the sum or average tie between these classes. Of course, this means that there can be self-ties or loops, because even if the original network was simple (not complex), any within-class ties will end up becoming loops and thus the network will be complex.

(bm <- to_blocks(alge, node_in_structural(alge)))

bm <- bm %>% as_tidygraph %>% 
  mutate(name = c("Freaks", "Squares", "Nerds", "Geek"))
graphr(bm)

Free play

Now try to find the regularly equivalent classes in the ison_lawfirm dataset. As this is a multiplex network, you can make this a uniplex network first.


An extension can be to also explore automorphically equivalent classes.

Glossary

Here are some of the terms that we have covered in this module:

r print_glossary()



Try the manynet package in your browser

Any scripts or data that you put into this service are public.

manynet documentation built on June 23, 2025, 9:07 a.m.