library(learnr) library(manynet) library(network) knitr::opts_chunk$set(echo = FALSE)
Most network analysts want to use network analysis to better understand empirical networks.
{manynet}
offers various functions for importing or coercing networks into formats that you can use.
This tutorial is going to cover two main ways to get data into the package:
{manynet}
, {migraph}
, or other packagesOne might also create or stochastically generate networks,
and {manynet}
includes functions for this too,
but we cover these functions in later tutorials on topology and diffusion.
This tutorial also covers ways to work with this data until it is in the class or format you require for further analysis.
As many R packages do, {manynet}
includes a number of datasets used for
teaching and testing the functions contained in the package.
These are sometimes classical network datasets,
such as the Southern Women dataset
or Zachary's Karateka dataset,
and sometimes new data with neat themes, features, or attributes that make
them exemplar teaching or testing data.
To see what data is in the package, you can explore the documentation available on the website (see here) or use a function in R to list the data available in the package.
One function, available in base R (with no added packages), is data(package = "manynet")
.
Type this in to the box below to see what datasets are available in the package.
There are buttons to start over, receive any hints/solutions available,
as well as to run the code you have entered to discover its effects.
Try it out now!
data(package = "_____")
data(package = "manynet")
On the left of the output are the names of the objects (here all starting with ison_*
).
On the right is a brief description of the networks and their canonical source.
Do you recognise any of the datasets?
{manynet}
also includes its own way of identifying network data in a package.
table_data()
returns a table of the network datasets in a package,
along with information about the number of nodes, ties, and various other features.
table_data()
Let's say that we are only interested in two-mode network data. How can we filter this table so that only those networks that are two-mode are retained?
table_data() %>% dplyr::filter(twomode)
question("How many two-mode networks are available in the manynet package?", answer(2, correct = TRUE), answer(1), answer(3), answer(13), answer(23), random_answer_order = TRUE, allow_retry = TRUE )
question("How many two-mode networks are available in the migraph package?", answer(13, correct = TRUE), answer(1), answer(2), answer(3), answer(23), answer(6), random_answer_order = TRUE, allow_retry = TRUE )
Ok, so we can see that there are a number of very interesting datasets available in this package. How do we access and use this data?
The easiest way to call the data is just to make sure that the package
is loaded using the command library(manynet)
,
and then use the selected dataset as named above.
^[Alternatively, the data can be called directly out of the package like this:
example_name <- manynet::ison_adolescents
, but since we think
you will probably want all of the other functions available in {manynet}
at your disposal, you may as well just load the package entirely.]
Let's try calling ison_adolescents
by first loading the {manynet}
'library'
and then just typing ison_adolescents
to see what happens.
library(______)
ison______
library(manynet)
ison_adolescents
See whether you can call up other datasets now too.
You won't need to load the {manynet}
package again (it'll stay loaded),
but identify a network that interests you from table_data()
and then
call/print it.
All of the network data available in {manynet}
(and {migraph}
) are in a
special tbl_graph
format, from the {tidygraph}
package,
that makes it compatible, flexible, and transparent.
When you call one of these data objects, some information about the type of network it is,
how many nodes and ties it has, and the first few examples of nodes and ties is given.
^[You may have noticed when the package was first loaded that it mentioned that the print_tbl_graph
method from that package was overwritten.
That's so that we can make some different choices about what and how networks
are described.]
Let's see whether we can make sense of the main features of this network.
Run the following line and then answer the questions.
ison_adolescents
question("How many nodes does this network have?", answer(8, correct = TRUE), answer(10, message = "There are 10 ties."), answer(1, message = "There is one nodal variable ('name')."), answer(2, message = "There are two more nodes beyond what is listed."), answer(6, message = "There are six nodes/names shown, but the info underneath says there are 2 more."), random_answer_order = TRUE, allow_retry = TRUE )
question("How many ties does this network have?", answer(4, message = "There are four more ties beyond the six that are listed."), answer(10, correct = TRUE, message = "There are 10 ties."), answer(1, message = "There is one network/graph."), answer(2, message = "There are two tie variables listing the nodes the tie is sent from and to."), answer(6, message = "There are six ties shown, but the info underneath says there are 4 more."), random_answer_order = TRUE, allow_retry = TRUE )
question("What kind of network is this? Choose all that apply.", answer("Undirected", correct = TRUE, message = learnr::random_praise()), answer("Labelled", correct = TRUE), answer("Two-mode"), answer("Complex"), answer("Signed"), answer("Weighted"), random_answer_order = TRUE, allow_retry = TRUE )
You can now describe the main dimensions and type of network. In another tutorial, we will see how we can describe such networks visually.
We can ask other questions of this data too.
{manynet}
uses a simple function naming convention so that
you always know to what it relates.
net_*()
functions usually return one value for the network or graph,
whether that be a string like Evelyn
or some number like 3
or -0.003
^[There are a few exceptions to this.]node_*()
functions always return a vector of values for the network
as long as the number of nodes or vertices in the network (of any mode)tie_*()
functions always return a vector of values for the network
as long as the number of ties or edges in the network (of any sign or type)To find out how many nodes are in the network, use net_nodes()
.
To find out how many nodes are in each mode, use net_dims()
.
To find out the names of those nodes, use node_names()
.
Use such functions to find out:
a) how many nodes are in the ison_southern_women
network
b) how many nodes are in each mode
b) how many ties are in the network
d) what nodal attributes there are in the network
c) what the names of the nodes are
net_nodes(ison_southern_women) net_dims(ison_southern_women) net_ties(ison_southern_women) net_node_attributes(ison_southern_women) node_names(ison_southern_women)
There are a bunch of logical checks for many common properties or features of networks.
For example, one can check whether a network is_twomode()
, is_directed()
, or is_labelled()
.
Remember, all is_*()
functions work on any compatible class.
We can describe and work with networks from other R packages too,
not just those in the tbl_graph
format.
For example, another commonly used package in network analysis is {network}
,
which includes a few example datasets of its own.
Can you remember how to find out which data are available in this package,
and call the last one in the list?
# You may need to call the data out of this package directly, # such as: data(flo, package = "network")
data(package = "_____") library(_____) data(_____) _____
data(package = "network") data(flo, package = "network") flo
This data uses quite a different class to what we encountered above.
It prints out the full adjacency matrix of the Florentine network,
but as a network
-class object (i.e. from the {network}
package).
This is no problem for {manynet}
(or {migraph}
),
since every included function works the same on any of the compatible classes,
but in case you would like to work with a network in a particular class,
or it needs to be in a particular format for further work (e.g. for use with {ergm}
),
then {manynet}
has you covered for that too.
Coercing networks between different classes of objects uses the as_*()
functions.
These functions will do their best to coerce data from the current class of the object
to the class named in the function.
Some classes have 'slots' or recognition for some kinds of information
that others don't.
For example, coercing a tbl_graph
into an edgelist will sacrifice all the information
about nodal attributes.
Still, we aim for these functions to be as lossless as possible and welcome feedback
that highlights how these translations can be improved.
Let's see whether we can coerce our 'flo' network into a tbl_graph
('tidygraph') class object.
_____ <- as_t_____(_____)
flo <- as_tidygraph(_____)
flo <- as_tidygraph(flo) flo
question("What are there 16 of in this network?", answer("Nodes", correct = TRUE), answer("Ties"), answer("Reciprocated ties"), answer("Loops"), answer("Components"), random_answer_order = TRUE, allow_retry = TRUE )
Other packages that include network data include
David Schoch's descriptively named {networkdata}
package.
The data in this package are {igraph}
-class objects.
Can you coerce one of the datasets in this package
into a tidygraph format? Into a network format?
Into a matrix? Into an edgelist?
Researchers will regularly find themselves needing to import and work with network data from outside of R. There are a great number of networks datasets and data resources available online. ^[Here we keep a necessarily partial list, but we are happy to update it with additional suggestions.] See for example:
Yet these resources contain data in a range of different formats,
some that are specifically made to work with certain software,
others that rely on open standards,
and yet others that keep data in a very standard edgelist (and perhaps nodelist) format
in .csv files or similar.
Fortunately, {manynet}
has functions to help with importing data from such formats too.
One format most users are long familiar with is Excel.
In Excel, users are typically collecting network data as edgelists, nodelists, or both.
Recall that edgelists tabulate senders/from and receivers/to of each tie in the first two columns and any other edge- or tie-related attributes as additional columns.
There may optionally also be a nodelist that tabulates
Edgelists are typically the main object to be imported,
and we can import them from an Excel file or a .csv
file.^[Note that if you import from a .csv file, please specify whether the separation value should be commas (sv = "comma"
) or semi-colons (sv = "semi-colon"
). The function expects comma separated values by default.]
For the sake of this exercise, we'll import some data, adols.csv
, that I've pre-saved within the package in the data/
folder of this tutorial.
Try the following code chunk.
adolties <- read_edgelist("data/adols.csv") flonodes <- read_edgelist("data/flonode.csv")
If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.
In some cases, users will be faced with having to collect data themselves,
or wish to first manipulate the data in Excel before importing it,
but may be uncertain about the expected format of an edgelist.
Here it may be useful to try exporting one of the built-in datasets
in {manynet}
to see how complete network data looks.
If this is potentially complex,
calling write_edgelist()
without any arguments will export a test
file with a barebones structure that you can overwrite with your own data.
write______(ison_marvel_relationships, "_____/marvedges.xlsx") write______(ison_marvel_relationships, "_____/marvnodes.xlsx")
write_edgelist(ison_marvel_relationships, "data/marvedges.xlsx") write_nodelist(ison_marvel_relationships, "data/marvnodes.xlsx")
Since network data can be complex, edgelists (and nodelists) may not be sufficient to structure all the information necessary to represent the network. For this reason, a variety of other external formats have been proposed and used. As such, you may find network data of interest that is in another format. Here are some examples:
read_pajek()
and write_pajek()
for importing and exporting .net or .paj filesread_ucinet()
and write_ucinet()
for importing and exporting .##h files (.##d files are automatically imported alongside them)read_graphml()
and write_graphml()
for importing and exporting .graphml filesread_dynetml()
for importing .dynetml filesFor more information on any of these functions, you can ask for help by typing ?read_pajek
in the console.
Whereas read_edgelist()
and read_nodelist()
will import into a tibble/data frame class,
read_pajek()
and read_ucinet()
will import the network into a tidygraph format (see above).
Of course, any network data that is imported can be easily coerced into any other compatible class.
Let's say we want to import the adolescents edgelist back in, but we want it in an igraph format.
There are three ways you might do this:
# 1. Separate steps adols <- read_edgelist("data/adols.csv") adolsigraph1 <- as_igraph(adols) adolsigraph1 # 2. Nested steps adolsigraph2 <- as_igraph(read_edgelist("data/adols.csv")) adolsigraph2 # 3. Chained steps adolsigraph3 <- read_edgelist("data/adols.csv") %>% as_igraph() adolsigraph3
How does it compare to the original?
question("Which versions are labelled", answer("ison_adolescents (the original)", correct = TRUE), answer("adolsigraph1", correct = TRUE), answer("adolsigraph2", correct = TRUE), answer("adolsigraph3", correct = TRUE), random_answer_order = TRUE, allow_retry = TRUE )
But how can we change the network to be unnamed (or anything else)?
As mentioned above, {manynet}
attempts to retain as much information as possible when converting objects between different classes.
The presumption is that users should explicitly decide to reduce or simplify their data.
{manynet}
includes functions for reformatting, transforming (or removing) certain properties of network objects.
Here will introduce a few functions used for 'reformatting' networks.
We call functions 'reformatting functions' if they change the type but not the order (number of nodes) in the network.
A good example is to_undirected()
.
The astute among you may have noticed that when we imported the adolescents network,
it returned a directed network instead of the original undirected network.
This was a consequence of a heuristic used during the import,
but gives us a good occasion to try out to_undirected()
.
Reimport the data/adols.csv
file, make it an igraph-class object, and then make it undirected.
read_edgelist("data/adols.csv") %>% as_igraph() %>% to_undirected()
Try this out with other compatible classes of objects, and reformatting other aspects of the network. For example:
to_unnamed()
removes/anonymises all vertex/node labelsto_named()
adds some random (U.S.) childrens' names, which can be useful for identifying particular nodesto_undirected()
replaces directed ties with an undirected tie (if an arc in either direction is present)to_redirected()
replaces undirected ties with directed ties (arcs) or, if already directed, swaps arcs' directionto_unweighted()
binarises or dichotomises a network around a particular threshold (by default 1
)to_unsigned()
returns just the "positive" or "negative" ties from a signed network, respectivelyto_uniplex()
reduces a multigraph or multiplex network to one with a single set of edges or tiesto_simplex()
removes all loops or self-ties from a complex networkThese functions are similar to the reformatting functions, and are also named to_*()
,
but their operation always changes the network's 'order' (number of nodes).
Good examples of this are to_mode1()
and to_mode2()
for transforming a two-mode network
into one of its one-mode projections.
to_mode1()
will transform (project) the network to a one-mode network of shared ties among its first set of nodes,
while to_mode2()
will project the original network to a network of shared ties among its second set of nodes.
For more information on projection,
see for example Knoke et al. (2021).
Let's try this out on a classic two-mode network, ison_southern_women
.
Assign and name the transformed networks something sensible using e.g.
women <- to_mode...
so that we can continue working with this data afterwards.
To assign and immediately print the result, wrap the line in parentheses.
ison_southern_women (s_women <- to_mode1(ison_southern_women)) (s_events <- to_mode2(ison_southern_women))
question("The first mode of the two-mode network consists of...", answer("women named Evelyn, Laura, Theresa, etc", correct = TRUE), answer("events called E1, E2, E3, etc"), answer("southern cities named Dunedin, Invercargill, Bluff, etc"), answer("events called Christmas Ball, End-of-year, Mardi Gras, etc"), answer("women named Gertrude, Sally, Agnes, etc"), random_answer_order = TRUE, allow_retry = TRUE )
Now use functions introduced above on the two projections you have created to find out:
a. how many nodes there are in each of these networks, b. what the names of the nodes are, and c. what tie attributes there are in the networks.
s_women <- to_mode1(ison_southern_women) s_events <- to_mode2(ison_southern_women)
net_nodes(s_women) net_nodes(s_events) node_names(s_women) node_names(s_events) net_tie_attributes(s_women) net_tie_attributes(s_events)
So we can see that the to_mode*()
functions have created a network of only one of the modes in the network.
The ties in these projected networks,
representing shared connections to nodes of the other mode, are weighted.
This shows up when listing the network's tie attributes,
or could be retrieved using tie_weights()
,
but can also be checked with the simple logical check is_weighted()
.
Retrieve the tie weights from your women projection. Find the average (mean) of this vector of tie weights, and the average (mean) tie weight overall.
tie_weights(_____)
mean(tie_weights(_____))
mean(as_matrix(_____))
tie_weights(s_women) mean(tie_weights(s_women)) mean(as_matrix(s_women))
question("2.316547 is...", answer("the average frequency of events shared by women who shared any events.", correct = TRUE), answer("the average frequency of events shared by women."), answer("how many shared events there are in original two-mode network."), answer("how many women there are in the projected network."), answer("how many shared events there are in the projected network."), random_answer_order = TRUE, allow_retry = TRUE )
Ok, so now we know that projection transforms an (unweighted) two-mode network
into a weighted one-mode network and what these weights represent.
Note though that counting the frequency of shared ties to nodes in the other mode
is just one (albeit the default) option for how ties in the projection are weighted.
Other options included in {manynet}
include the Jaccard index, Rand simple matching coefficient, Pearson coefficient, and Yule's Q.
These may be of interest if, for example, overlap should be weighted by participation.
Other transforming functions include:
to_giant()
identifies and returns only the main component of a network.to_no_isolates()
identifies and returns a network including only nodes with at least one tie.to_subgraph()
returns only a subgraph of the network based on e.g. some nodal attribute.to_ties()
returns a network where the ties in the original network become the nodes,
and the ties are shared adjacencies to nodes.to_matching()
returns a network in which each node is only tied to one of its previously existing ties such that the network's cardinality is maximised.
In other words, the algorithm tries to match nodes as best as possible so that each
node has a partner.
Note that this is not always possible.Remember, all these to_*()
functions work on any compatible class;
the to_*()
functions will also attempt to return that same class of object,
making it even easier to manipulate networks into shape for analysis.
After choosing and/or importing some network data, and alongside other modifications, you may need to add or delete particular nodes or ties, or add or delete specific nodal or tie attributes.
Sometimes you may wish to add particular nodes or ties. This can be achieved easily enough with the following functions. Note that one (1) additional node is added via the first command, and one (1) additional tie (between the nodes indexed 1 and 3) is added using the second.
add_nodes(ison_adolescents, 1) add_ties(ison_adolescents, list(1,3))
Other times you may wish to delete particular nodes or ties,
and there the syntax is similar.
Note here though that when a single value is used in the second argument,
this is interpreted as an index,
whereas one can also use the node's name for delete_nodes()
,
or name the tie(s) to be deleted using |
as a separator.
delete_nodes(ison_adolescents, 1) delete_nodes(ison_adolescents, "Sue") delete_ties(ison_adolescents, 1) delete_ties(ison_adolescents, "Carol|Tina")
Adding nodal attributes to a given network is relatively straightforward.
{manynet}
offers a more {igraph}
-like syntax, e.g. add_node_attribute()
,
as well as a more {dplyr}
-like syntax, e.g. mutate()
,
for those already familiar with these tools in R:
ison_adolescents %>% mutate_nodes(color = "red", degree = 1:8) %>% mutate_ties(weight = 1:10)
One can also delete nodal attributes in a similar way,
by assigning NULL
to a particular attribute.
ison_southern_women ison_southern_women %>% mutate_nodes(Surname = NULL, Title = NULL)
Note that to use {dplyr}
-like functions like
mutate()
, rename()
, filter()
, select()
, or join()
on network ties,
you will need to append the function name with _ties
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.