In massimoaria/openalexR: Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

https://github.com/ropensci/openalexR{.uri}

Latest version: r packageVersion("openalexR"), r Sys.Date()

by Massimo Aria

Full Professor in Social Statistics

PhD in Computational Statistics

Laboratory and Research Group STAD Statistics, Technology, Data Analysis

Department of Economics and Statistics

University of Naples Federico II

email aria\@unina.it{.email}

https://massimoaria.com

An R-package to gather bibliographic data from OpenAlex

openalexR helps you interface with the OpenAlex API to retrieve bibliographic infomation about publications, authors, institutions, sources, funders, publishers, topics and concepts with 5 main functions:

oa_query(): generates a valid query, written following the OpenAlex API syntax, from a set of arguments provided by the user.
oa_request(): downloads a collection of entities matching the query created by oa_query() or manually written by the user, and returns a JSON object in a list format.
oa2df(): converts the JSON object in classical bibliographic tibble/data frame.
oa_fetch(): composes three functions above so the user can execute everything in one step, i.e., oa_query |> oa_request |> oa2df
oa_random(): to get random entity, e.g., oa_random("works") gives a different work each time you run it

library(openalexR)
library(dplyr)

Works (think papers, publications)

This paper:

Aria, M., & Cuccurullo, C. (2017). bibliometrix: 
An R-tool for comprehensive science mapping analysis. 
Journal of informetrics, 11(4), 959-975.

is associated to the OpenAlex-id W2755950973. If you know your paper's OpenAlex ID, all you need to do is passing identifier = <openalex id> as an argument in oa_fetch():

paper_id <- oa_fetch(
  identifier = "W2755950973",
  entity = "works",
  verbose = TRUE
)
dplyr::glimpse(paper_id)

oa_fetch() is a composition of functions: oa_query |> oa_request |> oa2df. As results, oa_query() returns the query string including the OpenAlex endpoint API server address (default). oa_request() downloads the bibliographic records matching the query. Finally, oa2df() converts the final result list to a tibble. The final result is a complicated tibble, but we can use show_works() to display a simplified version:

paper_id %>% 
  show_works() %>%
  knitr::kable()

External id formats

OpenAlex endpoint accepts OpenAlex IDs and other external IDs (e.g., DOI, ISSN) in several formats, including Digital Object Identifier (DOI) and Persistent Identifiers (PIDs).

oa_fetch(
  # identifier = "https://doi.org/10.1016/j.joi.2017.08.007", # would also work (PIDs)
  identifier = "doi:10.1016/j.joi.2017.08.007",
  entity = "works"
) %>% 
  show_works() %>%
  knitr::kable()

More than one publications/authors

https://api.openalex.org/authors/https://orcid.org/

If you know the OpenAlex IDs of these entities, you can also feed them into the identifier argument.

oa_fetch(
  identifier = c("W2741809807", "W2755950973"),
  # identifier = c("https://doi.org/10.1016/j.joi.2017.08.007", "https://doi.org/10.1016/j.joi.2017.08.007"), # TODO
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()

However, if you only know their external identifies, say, DOIs, you would need to use doi as a filter (either the canonical form with https://doi.org/ or without should work):

oa_fetch(
  # identifier = c("W2741809807", "W2755950973"),
  doi = c("10.1016/j.joi.2017.08.007", "https://doi.org/10.1093/bioinformatics/btab727"),
  entity = "works",
  verbose = TRUE
) %>% 
  show_works() %>%
  knitr::kable()

Filters

In most cases, we are interested in downloading a collection of items that meet one or more inclusion/exclusion criteria (filters). Supported filters for each entity are listed here.

Example: We want to download all works published by a set of authors. We can do this by filtering on the authorships.author.id/author.id or authorships.author.orcid/author.orcid attribute (see more on works attributes):

oa_fetch(
  entity = "works",
  author.id = c("A5048491430", "A5023888391"),
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()

orcids <- c("0000-0003-3737-6565", "0000-0002-8517-9411")
canonical_orcids <- paste0("https://orcid.org/", orcids)
oa_fetch(
  entity = "works",
  author.orcid = canonical_orcids,
  verbose = TRUE
) %>% 
  show_works() %>% 
  knitr::kable()

Example: We want to download all works that have been cited more than 50 times, published between 2020 and 2021, and include the strings "bibliometric analysis" or "science mapping" in the title. Maybe we also want the results to be sorted by total citations in a descending order.

Setting the argument count_only = TRUE, the function oa_request() returns the number of items matching the query without downloading the collection.

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = TRUE,
  verbose = TRUE
)

We can now download the records and transform it into a tibble/data frame by setting count_only = FALSE (also the default value):

oa_fetch(
  entity = "works",
  title.search = c("bibliometric analysis", "science mapping"),
  cited_by_count = ">50", 
  from_publication_date = "2020-01-01",
  to_publication_date = "2021-12-31",
  options = list(sort = "cited_by_count:desc"),
  count_only = FALSE
) %>%
  show_works() %>%
  knitr::kable()

Read on to see how we can shorten these two function calls.

Authors

Similarly to work, we can use identifier to pass in authors' OpenAlex ID.

Example: We want more information on authors with IDs A5069892096 and A5023888391.

oa_fetch(
  identifier = c("A5069892096", "A5023888391"),
  verbose = TRUE
) %>%
  show_authors() %>%
  knitr::kable()

Example: We want download all authors' records of scholars who work at the University of Naples Federico II (OpenAlex ID: I71267560) and who have published more than 499 works.

Let's first check how many records match the query, then set count_only = FALSE to download the entire collection. We can do this by first defining a list of arguments, then adding count_only (default FALSE) to this list:

my_arguments <- list(
  entity = "authors",
  last_known_institutions.id = "I71267560",
  works_count = ">499"
  )

do.call(oa_fetch, c(my_arguments, list(count_only = TRUE)))
do.call(oa_fetch, my_arguments) %>% 
  show_authors() %>%
  knitr::kable()

You can also use other filters such as display_name, has_orcid, and orcid:

oa_fetch(
  entity = "authors",
  display_name.search = "Massimo Aria",
  has_orcid = "true"
) %>%
  show_authors() %>%
  knitr::kable()

oa_fetch(
  entity = "authors",
  orcid = "0000-0002-8517-9411"
) %>%
  show_authors() %>%
  knitr::kable()

Institutions

Example: We want download all records regarding Italian institutions (country_code:it) that are classified as educational (type:education). Again, we check how many records match the query then download the collection:

italian_insts <- list(
  entity = "institutions",
  country_code = "it",
  type = "education",
  verbose = TRUE
)

do.call(oa_fetch, c(italian_insts, list(count_only = TRUE)))
dplyr::glimpse(do.call(oa_fetch, italian_insts))

Keywords

Example: We want to download the records of all the keywords that more than 1000 works were tagged with:

popular_keywords <- list(
  entity = "keywords",
  works_count = ">1000",
  verbose = TRUE
)

do.call(oa_fetch, c(popular_keywords, list(count_only = TRUE)))
dplyr::glimpse(do.call(oa_fetch, popular_keywords))

Other examples

Get all works citing a particular work

We can download all publications citing another publication by using the filter attribute cites.

For example, if we want to download all publications citing the article Aria and Cuccurullo (2017), we have just to set the argument filter as cites = "W2755950973" where "W2755950973" is the OA id for the article by Aria and Cuccurullo.

aria_count <- oa_fetch(
  entity = "works",
  cites = "W2755950973",
  count_only = TRUE,
  verbose = TRUE
) 
aria_count

This query will return a collection of r aria_count["count"] publications. Among these articles, let's download the ones published in the following year:

oa_fetch(
  entity = "works",
  cites = "W2755950973",
  publication_year = 2018,
  count_only = FALSE,
  verbose = TRUE
) %>% 
  dplyr::glimpse()

Convert an OpenAlex data frame to a bibliometrix object

The bibliometrix R-package (https://www.bibliometrix.org) provides a set of tools for quantitative research in bibliometrics and scientometrics. Today it represents one of the most used science mapping software in the world. In a recent survey on bibliometric analysis tools, Moral-Muñoz et al. (2020) wrote: "At this moment, maybe Bibliometrix and its Shiny platform contain the more extensive set of techniques implemented, and together with the easiness of its interface, could be a great software for practitioners".

The function oa2bibliometrix converts a bibliographic data frame of works into a bibliometrix object. This object can be used as input collection of a science mapping workflow.

bib_ls <- list(
  identifier = NULL,
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-03-31"
)

do.call(oa_fetch, c(bib_ls, list(count_only = TRUE)))
do.call(oa_fetch, bib_ls) %>% 
  oa2bibliometrix() %>% 
  dplyr::glimpse()

About OpenAlex

OpenAlex is a fully open catalog of the global research system. It's named after the ancient Library of Alexandria. The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. There are five types of entities:

Works are papers, books, datasets, etc; they cite other works
Authors are people who create works
Institutions are universities and other orgs that are affiliated with works (via authors)
Concepts tag Works with a topic

Acknowledgements

Package hex was made with Midjourney and thus inherits a CC BY-NC 4.0 license.

massimoaria/openalexR documentation built on June 9, 2025, 7:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

massimoaria/openalexR
Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

In massimoaria/openalexR: Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

An R-package to gather bibliographic data from OpenAlex

Works (think papers, publications)

External id formats

More than one publications/authors

Filters

Authors

Institutions

Keywords

Other examples

Convert an OpenAlex data frame to a bibliometrix object

About OpenAlex

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

massimoaria/openalexR Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

In massimoaria/openalexR: Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API

An R-package to gather bibliographic data from OpenAlex

Works (think papers, publications)

External id formats

More than one publications/authors

Filters

Authors

Institutions

Keywords

Other examples

Convert an OpenAlex data frame to a bibliometrix object

About OpenAlex

Acknowledgements

R Package Documentation

Browse R Packages

We want your feedback!

massimoaria/openalexR
Getting Bibliographic Records from 'OpenAlex' Database Using 'DSL' API