How to use biblionetwork
In biblionetwork: Create Different Types of Bibliometric Networks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/",
  out.width = "100%"
)

This vignette introduces you to the different functions of the package with the [data integrated][Incorporated data] in the package.

The basic coupling angle (or cosine) function

The biblio_coupling() function is the most general function of the package. This function takes as an input a direct citation data frame (entities, like articles, authors or institutions, citing references) and produces an edge list for bibliographic coupling network, with the number of references that different articles share together, as well as the coupling angle value of edges [@sen1983]. This is a standard way to build bibliographic coupling network using Salton's cosine measure: it divides the number of references that two articles share by the square root of the product of both articles bibliography lengths. It avoids giving too much importance to articles with a large bibliography. It looks like:

$$ \frac{R(A) \bullet R(B)}{\sqrt{L(A).L(B)}} $$

with $R(A)$ and $R(B)$ the references of documents A and B, $R(A) \bullet R(B)$ being the number of shared references by A and B, and $L(A)$ and $L(B)$ the length of the bibliographies of documents A and B.

The output is an edge list linking nodes together (see the from and to columns) with a weight for each edge being the coupling angle measure. If normalized_weight_only is set to be FALSE, another column displays the number of references shared by the two nodes.

This example use the [Ref_stagflation][Incorporated data] data frame.

library(biblionetwork)
biblio_coupling(Ref_stagflation, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                normalized_weight_only = FALSE, 
                weight_threshold = 1)

This function is a relatively general function that can also be used:

for co-citation, just by inverting the sourceand ref columns, but rather use the [biblio_cocitation()];
for title co-occurence networks (taking care of the length of the title thanks to the coupling angle measure);
for co-authorship networks (taking care of the number of co-authors an author has collaborated with on a period), but rather use the [coauth_network()].

The function just keeps the edges that have a non-normalized weight superior to the weight_threshold. In a large bibliographic coupling network, you can consider for instance that sharing only one reference is not sufficient/significant for two articles to be linked together. This parameter could also be modified to avoid creating intractable networks with too many edges.

biblio_coupling(Ref_stagflation, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                weight_threshold = 3)

As explained above, you can use the biblio_coupling() function for creating a co-citation network, you just have to put the references in the source column (they will be the nodes of your network) and the citing articles in ref. As it is likely to create some confusion, the package also integrates a biblio_cocitation() function, which has a similar structure to biblio_coupling(), but which is explicitly for co-citation: citing articles stay in source and references stay in ref. You can see in the next example that they produce the same results:

biblio_coupling(Ref_stagflation, 
                source = "ItemID_Ref", 
                ref = "Citing_ItemID_Ref")

biblio_cocitation(Ref_stagflation, 
                  source = "Citing_ItemID_Ref", 
                  ref = "ItemID_Ref")

Testing another method: the `coupling_strength()` function

This coupling_strength() calculates the coupling strength measure [following @vladutz1984 and @shen2019] from a direct citation data frame. It is a refinement of biblio_coupling(): it takes into account the frequency with which a reference shared by two articles has been cited in the whole corpus. In other words, the most cited references are less important in the links between two articles, than references that have been rarely cited. To a certain extent, it is similar to the tf-idf measure. It looks like:

$$ \frac{1}{L(A)}.\frac{1}{L(A)}\sum_{j}(log({\frac{N}{freq(R_{j})}})) $$

with $N$ the number of articles in the whole dataset and $freq(R_{j})$ the number of time the reference j (which is shared by documents A and B) is cited in the whole corpus.

coupling_strength(Ref_stagflation, 
                  source = "Citing_ItemID_Ref", 
                  ref = "ItemID_Ref", 
                  weight_threshold = 1)

Aggregating at the "entity" level

Rather than focusing on documents, you can want to study the relationships between authors, institutions/affiliations or journals. The coupling_entity() function allows you to do that. Coupling links are calculated using the coupling angle measure (like biblio_coupling()) or the coupling strength measure (like coupling_strength()). Coupling links are calculated depending of the number of references two authors share, taking into account the minimum number of times two authors are citing each reference. For instance, if two entities share a reference in common, the first one citing it twice (in other words, citing it in two different articles), the second one three times, the function takes two as the minimum value. In addition to the features of the [coupling strength measure][Testing another method: the coupling_strength() function] or the [coupling angle measure][The basic coupling angle (or cosine) function], it means that, if two entities share two references in common, the fact that the first reference is cited at least four times by the two entities, whereas the second reference is cited at least only once, the first reference contributes more to the edge weight than the second reference. This use of minimum shared reference for entities coupling comes from @zhao2008b. With the coupling strength measure, it looks like:

$$ \frac{1}{L(A)}.\frac{1}{L(A)}\sum_{j} Min(C_{Aj},C_{Bj}).(log({\frac{N}{freq(R_{j})}})) $$

with $C_{Aj}$ and $C_{Bj}$ the number of time documents A and B cite the reference $j$.

This example use the [Ref_stagflation][Incorporated data] and the [Authors_stagflation][Incorporated data] data frames.

# merging the references data with the citing author information in Nodes_stagflation
entity_citations <- merge(Ref_stagflation, 
                          Authors_stagflation, 
                          by.x = "Citing_ItemID_Ref", 
                          by.y = "ItemID_Ref",
                          allow.cartesian = TRUE) 
# allow.cartesian is needed as we have several authors per article, thus the merge results 
# is longer than the longer merged data frame

coupling_entity(entity_citations, 
                source = "Citing_ItemID_Ref", 
                ref = "ItemID_Ref", 
                entity = "Author.y", 
                method = "coupling_angle")

A different world: building co-authorship network

Even if the weights of co-authorship links can be calculated using the biblio_coupling() function with authors as source and articles as ref, the method used is not necessarily the most appropriate for co-authorship networks. The coauth_network() function implements different types of methods for calculating the weights linking different authors:^[I take as example authors here, but the function could also be used for calculating a co-authorship network with institutions or countries as nodes.]

a "full counting" method;
a "fractional counting" method [see @perianes-rodriguez2016b for an interesting comparison between full counting and fractional counting results];
a "fractional counting refined" method, inspired by @leydesdorff2017.

In addition, it is possible to take into account the total number of collaborations of two linked authors, by fixing cosine_normalized to True.

This example use the [Authors_stagflation.rda][Incorporated data] file.

full_counting <- coauth_network(Authors_stagflation, 
                                authors = "Author", 
                                articles = "ItemID_Ref", 
                                method = "full_counting")
head(full_counting[order(Source)],10)

fractional_counting <- coauth_network(Authors_stagflation, 
                                      authors = "Author", 
                                      articles = "ItemID_Ref", 
                                      method = "fractional_counting")
head(fractional_counting[order(Source)],10)

fractional_counting_cosine <- coauth_network(Authors_stagflation,
                                             authors = "Author", 
                                             articles = "ItemID_Ref", 
                                             method = "fractional_counting", 
                                             cosine_normalized = TRUE)
head(fractional_counting_cosine[order(Source)],10)

Incorporated data

The biblionetwork package contains bibliometric data built by @goutsmedt2021a. These data gather the academic articles and books that endeavoured to explain the United States stagflation of the 1970s, published between 1975 and 2013. They also gather all the references cited by these articles and books on stagflation. The Nodes_stagflation file contains information about the academic articles and books on stagflation (the staflation documents), as well as about the references cited at least by two of these stagflation documents. The Ref_stagflation is a data frame of direct citations, with the identifiers of citing documents, and the identifiers of cited documents. The Authors_stagflation is a data frame with the list of documents explaining the US stagflation, and all the authors of these documents (Nodes_stagflation just takes the first author for each document).