Data

library(learnr)
library(manynet)
library(network)
knitr::opts_chunk$set(echo = FALSE)

Introduction

gif of man trashing own computer

Most network analysts want to use network analysis to better understand empirical networks. {manynet} offers various functions for importing or coercing networks into formats that you can use. This tutorial is going to cover two main ways to get data into the package:

One might also create or stochastically generate networks, and {manynet} includes functions for this too, but we cover these functions in later tutorials on topology and diffusion.

This tutorial also covers ways to work with this data until it is in the class or format you require for further analysis.

Packaged data

As many R packages do, {manynet} includes a number of datasets used for teaching and testing the functions contained in the package. These are sometimes classical network datasets, such as the Southern Women dataset or Zachary's Karateka dataset, and sometimes new data with neat themes, features, or attributes that make them exemplar teaching or testing data.

Finding the data

To see what data is in the package, you can explore the documentation available on the website (see here) or use a function in R to list the data available in the package.

One function, available in base R (with no added packages), is data(package = "manynet"). Type this in to the box below to see what datasets are available in the package. There are buttons to start over, receive any hints/solutions available, as well as to run the code you have entered to discover its effects. Try it out now!


data(package = "_____")
data(package = "manynet")

On the left of the output are the names of the objects (here all starting with ison_*). On the right is a brief description of the networks and their canonical source. Do you recognise any of the datasets?

{manynet} also includes its own way of identifying network data in a package. table_data() returns a table of the network datasets in a package, along with information about the number of nodes, ties, and various other features.

table_data()

Let's say that we are only interested in two-mode network data. How can we filter this table so that only those networks that are two-mode are retained?

table_data() %>% dplyr::filter(twomode)
question("How many two-mode networks are available in the manynet package?",
  answer(2,
         correct = TRUE),
  answer(1),
  answer(3),
  answer(13),
  answer(23),
  random_answer_order = TRUE,
  allow_retry = TRUE
)
question("How many two-mode networks are available in the migraph package?",
  answer(13,
         correct = TRUE),
  answer(1),
  answer(2),
  answer(3),
  answer(23),
  answer(6),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Calling the data

Ok, so we can see that there are a number of very interesting datasets available in this package. How do we access and use this data?

The easiest way to call the data is just to make sure that the package is loaded using the command library(manynet), and then use the selected dataset as named above. ^[Alternatively, the data can be called directly out of the package like this: example_name <- manynet::ison_adolescents, but since we think you will probably want all of the other functions available in {manynet} at your disposal, you may as well just load the package entirely.] Let's try calling ison_adolescents by first loading the {manynet} 'library' and then just typing ison_adolescents to see what happens.


library(______)
ison______
library(manynet)
ison_adolescents

Free play

See whether you can call up other datasets now too. You won't need to load the {manynet} package again (it'll stay loaded), but identify a network that interests you from table_data() and then call/print it.


Describing networks

Reading prints

All of the network data available in {manynet} (and {migraph}) are in a special tbl_graph format, from the {tidygraph} package, that makes it compatible, flexible, and transparent. When you call one of these data objects, some information about the type of network it is, how many nodes and ties it has, and the first few examples of nodes and ties is given. ^[You may have noticed when the package was first loaded that it mentioned that the print_tbl_graph method from that package was overwritten. That's so that we can make some different choices about what and how networks are described.] Let's see whether we can make sense of the main features of this network. Run the following line and then answer the questions.

ison_adolescents
question("How many nodes does this network have?",
  answer(8,
         correct = TRUE),
  answer(10,
         message = "There are 10 ties."),
  answer(1,
         message = "There is one nodal variable ('name')."),
  answer(2,
         message = "There are two more nodes beyond what is listed."),
  answer(6,
         message = "There are six nodes/names shown, but the info underneath says there are 2 more."),
  random_answer_order = TRUE,
  allow_retry = TRUE
)
question("How many ties does this network have?",
  answer(4,
         message = "There are four more ties beyond the six that are listed."),
  answer(10,
         correct = TRUE,
         message = "There are 10 ties."),
  answer(1,
         message = "There is one network/graph."),
  answer(2,
         message = "There are two tie variables listing the nodes the tie is sent from and to."),
  answer(6,
         message = "There are six ties shown, but the info underneath says there are 4 more."),
  random_answer_order = TRUE,
  allow_retry = TRUE
)
question("What kind of network is this? Choose all that apply.",
  answer("Undirected",
         correct = TRUE,
         message = learnr::random_praise()),
  answer("Labelled",
         correct = TRUE),
  answer("Two-mode"),
  answer("Complex"),
  answer("Signed"),
  answer("Weighted"),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

You can now describe the main dimensions and type of network. In another tutorial, we will see how we can describe such networks visually.

Grabbing details

We can ask other questions of this data too. {manynet} uses a simple function naming convention so that you always know to what it relates.

To find out how many nodes are in the network, use net_nodes(). To find out how many nodes are in each mode, use net_dims(). To find out the names of those nodes, use node_names(). Use such functions to find out:

a) how many nodes are in the ison_southern_women network b) how many nodes are in each mode b) how many ties are in the network d) what nodal attributes there are in the network c) what the names of the nodes are


net_nodes(ison_southern_women)
net_dims(ison_southern_women)
net_ties(ison_southern_women)
net_node_attributes(ison_southern_women)
node_names(ison_southern_women)

There are a bunch of logical checks for many common properties or features of networks. For example, one can check whether a network is_twomode(), is_directed(), or is_labelled(). Remember, all is_*() functions work on any compatible class.

From class to class

Network class objects

We can describe and work with networks from other R packages too, not just those in the tbl_graph format. For example, another commonly used package in network analysis is {network}, which includes a few example datasets of its own. Can you remember how to find out which data are available in this package, and call the last one in the list?


# You may need to call the data out of this package directly,
# such as:
data(flo, package = "network")
data(package = "_____")
library(_____)
data(_____)
_____
data(package = "network")
data(flo, package = "network")
flo

This data uses quite a different class to what we encountered above. It prints out the full adjacency matrix of the Florentine network, but as a network-class object (i.e. from the {network} package). This is no problem for {manynet} (or {migraph}), since every included function works the same on any of the compatible classes, but in case you would like to work with a network in a particular class, or it needs to be in a particular format for further work (e.g. for use with {ergm}), then {manynet} has you covered for that too.

Class coercion

Coercing networks between different classes of objects uses the as_*() functions. These functions will do their best to coerce data from the current class of the object to the class named in the function. Some classes have 'slots' or recognition for some kinds of information that others don't. For example, coercing a tbl_graph into an edgelist will sacrifice all the information about nodal attributes. Still, we aim for these functions to be as lossless as possible and welcome feedback that highlights how these translations can be improved. Let's see whether we can coerce our 'flo' network into a tbl_graph ('tidygraph') class object.


_____ <- as_t_____(_____)
flo <- as_tidygraph(_____)
flo <- as_tidygraph(flo)
flo
question("What are there 16 of in this network?",
  answer("Nodes",
         correct = TRUE),
  answer("Ties"),
  answer("Reciprocated ties"),
  answer("Loops"),
  answer("Components"),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Other packages that include network data include David Schoch's descriptively named {networkdata} package. The data in this package are {igraph}-class objects. Can you coerce one of the datasets in this package into a tidygraph format? Into a network format? Into a matrix? Into an edgelist?

External data

Finding data

Researchers will regularly find themselves needing to import and work with network data from outside of R. There are a great number of networks datasets and data resources available online. ^[Here we keep a necessarily partial list, but we are happy to update it with additional suggestions.] See for example:

Yet these resources contain data in a range of different formats, some that are specifically made to work with certain software, others that rely on open standards, and yet others that keep data in a very standard edgelist (and perhaps nodelist) format in .csv files or similar. Fortunately, {manynet} has functions to help with importing data from such formats too.

Importing edgelists

One format most users are long familiar with is Excel. In Excel, users are typically collecting network data as edgelists, nodelists, or both. Recall that edgelists tabulate senders/from and receivers/to of each tie in the first two columns and any other edge- or tie-related attributes as additional columns. There may optionally also be a nodelist that tabulates Edgelists are typically the main object to be imported, and we can import them from an Excel file or a .csv file.^[Note that if you import from a .csv file, please specify whether the separation value should be commas (sv = "comma") or semi-colons (sv = "semi-colon"). The function expects comma separated values by default.] For the sake of this exercise, we'll import some data, adols.csv, that I've pre-saved within the package in the data/ folder of this tutorial. Try the following code chunk.

adolties <- read_edgelist("data/adols.csv")
flonodes <- read_edgelist("data/flonode.csv")

If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.

Exporting edgelists

In some cases, users will be faced with having to collect data themselves, or wish to first manipulate the data in Excel before importing it, but may be uncertain about the expected format of an edgelist. Here it may be useful to try exporting one of the built-in datasets in {manynet} to see how complete network data looks. If this is potentially complex, calling write_edgelist() without any arguments will export a test file with a barebones structure that you can overwrite with your own data.


write______(ison_marvel_relationships, "_____/marvedges.xlsx")
write______(ison_marvel_relationships, "_____/marvnodes.xlsx")
write_edgelist(ison_marvel_relationships, "data/marvedges.xlsx")
write_nodelist(ison_marvel_relationships, "data/marvnodes.xlsx")

Importing other formats

Since network data can be complex, edgelists (and nodelists) may not be sufficient to structure all the information necessary to represent the network. For this reason, a variety of other external formats have been proposed and used. As such, you may find network data of interest that is in another format. Here are some examples:

For more information on any of these functions, you can ask for help by typing ?read_pajek in the console. Whereas read_edgelist() and read_nodelist() will import into a tibble/data frame class, read_pajek() and read_ucinet() will import the network into a tidygraph format (see above). Of course, any network data that is imported can be easily coerced into any other compatible class. Let's say we want to import the adolescents edgelist back in, but we want it in an igraph format. There are three ways you might do this:

# 1. Separate steps
adols <- read_edgelist("data/adols.csv")
adolsigraph1 <- as_igraph(adols)
adolsigraph1
# 2. Nested steps
adolsigraph2 <- as_igraph(read_edgelist("data/adols.csv"))
adolsigraph2
# 3. Chained steps
adolsigraph3 <- read_edgelist("data/adols.csv") %>% as_igraph()
adolsigraph3

How does it compare to the original?

question("Which versions are labelled",
  answer("ison_adolescents (the original)",
         correct = TRUE),
  answer("adolsigraph1",
         correct = TRUE),
  answer("adolsigraph2",
         correct = TRUE),
  answer("adolsigraph3",
         correct = TRUE),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

But how can we change the network to be unnamed (or anything else)?

Reformatting networks

As mentioned above, {manynet} attempts to retain as much information as possible when converting objects between different classes. The presumption is that users should explicitly decide to reduce or simplify their data. {manynet} includes functions for reformatting, transforming (or removing) certain properties of network objects. Here will introduce a few functions used for 'reformatting' networks. We call functions 'reformatting functions' if they change the type but not the order (number of nodes) in the network.

A good example is to_undirected(). The astute among you may have noticed that when we imported the adolescents network, it returned a directed network instead of the original undirected network. This was a consequence of a heuristic used during the import, but gives us a good occasion to try out to_undirected(). Reimport the data/adols.csv file, make it an igraph-class object, and then make it undirected.


read_edgelist("data/adols.csv") %>% as_igraph() %>% to_undirected()

Free play

Try this out with other compatible classes of objects, and reformatting other aspects of the network. For example:


Transforming networks

These functions are similar to the reformatting functions, and are also named to_*(), but their operation always changes the network's 'order' (number of nodes).

Projections

Good examples of this are to_mode1() and to_mode2() for transforming a two-mode network into one of its one-mode projections. to_mode1() will transform (project) the network to a one-mode network of shared ties among its first set of nodes, while to_mode2() will project the original network to a network of shared ties among its second set of nodes. For more information on projection, see for example Knoke et al. (2021).

Let's try this out on a classic two-mode network, ison_southern_women. Assign and name the transformed networks something sensible using e.g. women <- to_mode... so that we can continue working with this data afterwards. To assign and immediately print the result, wrap the line in parentheses.


ison_southern_women
(s_women <- to_mode1(ison_southern_women))
(s_events <- to_mode2(ison_southern_women))
question("The first mode of the two-mode network consists of...",
  answer("women named Evelyn, Laura, Theresa, etc",
         correct = TRUE),
  answer("events called E1, E2, E3, etc"),
  answer("southern cities named Dunedin, Invercargill, Bluff, etc"),
  answer("events called Christmas Ball, End-of-year, Mardi Gras, etc"),
  answer("women named Gertrude, Sally, Agnes, etc"),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Now use functions introduced above on the two projections you have created to find out:

a. how many nodes there are in each of these networks, b. what the names of the nodes are, and c. what tie attributes there are in the networks.

s_women <- to_mode1(ison_southern_women)
s_events <- to_mode2(ison_southern_women)

net_nodes(s_women)
net_nodes(s_events)
node_names(s_women)
node_names(s_events)
net_tie_attributes(s_women)
net_tie_attributes(s_events)

So we can see that the to_mode*() functions have created a network of only one of the modes in the network. The ties in these projected networks, representing shared connections to nodes of the other mode, are weighted. This shows up when listing the network's tie attributes, or could be retrieved using tie_weights(), but can also be checked with the simple logical check is_weighted().

Retrieve the tie weights from your women projection. Find the average (mean) of this vector of tie weights, and the average (mean) tie weight overall.


tie_weights(_____)
mean(tie_weights(_____))
mean(as_matrix(_____))
tie_weights(s_women)
mean(tie_weights(s_women))
mean(as_matrix(s_women))
question("2.316547 is...",
  answer("the average frequency of events shared by women who shared any events.",
         correct = TRUE),
  answer("the average frequency of events shared by women."),
  answer("how many shared events there are in original two-mode network."),
  answer("how many women there are in the projected network."),
  answer("how many shared events there are in the projected network."),
  random_answer_order = TRUE,
  allow_retry = TRUE
)

Ok, so now we know that projection transforms an (unweighted) two-mode network into a weighted one-mode network and what these weights represent. Note though that counting the frequency of shared ties to nodes in the other mode is just one (albeit the default) option for how ties in the projection are weighted. Other options included in {manynet} include the Jaccard index, Rand simple matching coefficient, Pearson coefficient, and Yule's Q. These may be of interest if, for example, overlap should be weighted by participation.

Other transformations

Other transforming functions include:

Remember, all these to_*() functions work on any compatible class; the to_*() functions will also attempt to return that same class of object, making it even easier to manipulate networks into shape for analysis.

Free play


Modifying data

After choosing and/or importing some network data, and alongside other modifications, you may need to add or delete particular nodes or ties, or add or delete specific nodal or tie attributes.

Adding/deleting nodes and ties

Sometimes you may wish to add particular nodes or ties. This can be achieved easily enough with the following functions. Note that one (1) additional node is added via the first command, and one (1) additional tie (between the nodes indexed 1 and 3) is added using the second.

add_nodes(ison_adolescents, 1)
add_ties(ison_adolescents, list(1,3))

Other times you may wish to delete particular nodes or ties, and there the syntax is similar. Note here though that when a single value is used in the second argument, this is interpreted as an index, whereas one can also use the node's name for delete_nodes(), or name the tie(s) to be deleted using | as a separator.

delete_nodes(ison_adolescents, 1)
delete_nodes(ison_adolescents, "Sue")
delete_ties(ison_adolescents, 1)
delete_ties(ison_adolescents, "Carol|Tina")

Adding/deleting attributes

Adding nodal attributes to a given network is relatively straightforward. {manynet} offers a more {igraph}-like syntax, e.g. add_node_attribute(), as well as a more {dplyr}-like syntax, e.g. mutate(), for those already familiar with these tools in R:

ison_adolescents %>% 
  mutate_nodes(color = "red",
               degree = 1:8) %>% 
  mutate_ties(weight = 1:10)

One can also delete nodal attributes in a similar way, by assigning NULL to a particular attribute.

ison_southern_women
ison_southern_women %>% 
  mutate_nodes(Surname = NULL,
               Title = NULL)

Note that to use {dplyr}-like functions like mutate(), rename(), filter(), select(), or join() on network ties, you will need to append the function name with _ties.



Try the manynet package in your browser

Any scripts or data that you put into this service are public.

manynet documentation built on June 23, 2025, 9:07 a.m.