knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(milestones)

This vignette shows how to run the main path analysis for a dataset containing bibliographic and citation data.

The first step is to search for the terms on Web of Science. If you are conducting your own search, just export it as indicated in this vignette from package bibliometrix (which is used under the hood to read the data). An example of such data is available in this package, so you can just import it as indicated below. This data corresponds for a search of "captopril + hypertension".

wos_records = system.file("extdata", "optogenetics.txt", package = "milestones", mustWork = T)

M = read_biblio_data(wos_records)

Once the data is imported, the next step is building a network. The main path algorithm runs on the citation network. This next step has many substeps and it might go wrong. For the main path algorithm to run, it needs the network not to have any cycles (i.e. if paper A cites paper B, paper B cannot cite paper A). The build_network function will attempt to fix this if it occurs, but it is not guaranteed (there might be longer cycles that go undetected - e.g. A cites B, B cites C, C cites D, D cites A). It will warn you if it can't remove all cycles, however, if it fails, there's not much that can be done at this time.

NET = build_network(M)

If it succeeds, the preparation is done and the next step involves an external software, Pajek. In order to do so, first export the network in a format Pajek can read (a .net file).

save_network(NET, filename = "citation_network")

In Pajek, the following steps will build and export the main path:

  1. Read Network (the button with the folder icon below the Networks label) → to pick the .net file saved in the previous step. Alternatively, you can drag and drop the file on Pajek.
  2. Network → Acyclic Network → Create Weighted Network + Vector → Trasversal Weights → Search Path Link Count (SPLC). At this point, if the citation loops mentioned above have not been eliminated, Pajek will throw an error. If it succeeds, there will be a new entry "Citation Weights ..." under Vectors.
  3. Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Key-Route. Choose the Key Routes 1-10 as the parameter. This will create a new entry "Vertices on Key-Route ..." under Partitions.
  4. Click on the Save button for Networks and save a .net file with the main path (call it "mainpath1.net". Make sure that the selected network when you save is the main path (something named like "Key-Route Global Main Path ...").

Once you have the .net file with the main path, import this file back into R (for this example, the file already exists):

main_path_filename = system.file("extdata", "mainpath1.net", package = "milestones", mustWork = T)

NET_MP1 = read_network(filename = main_path_filename, NET)

Now that the main path is identified, save the list of papers contained within it (it will be saved as a TSV file.

save_papers_mp(NET_MP1, filename = "mainpath1", M)

Alternatively, the save_papers function accepts a list of networks to save in a single list of papers. Example (not run):

save_papers_mp(list(NET_MP1, NET_MP2), filename = "mainpath1+2_papers.tsv")


KleberNeves/milestones documentation built on Dec. 18, 2021, 3:36 a.m.