knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE )
This package gives access to INSPIRE HEP, a comprehensive source for High-Energy Physics Literature.
API Documentation: https://inspirehep.net/info/hep/api
No API registration needed, no limits in place.
If you need to gather many records or in case you have many queries, please be nice and consider the following options for bulk downloads:
See this example how to work with the json dump
Get the development version from GitHub
install.packages("devtools") devtools::install_github("njahn82/inspirehep")
Load inspirehep
library('inspirehep')
Use hep_search
to search INSPIRE HEP, and hep_details
to get detailed
information on the record-level.
If you are familiar with the INSPIRE HEP web search, discovering literature
with hep_search
is easy because the API supports all well-known search
features. Through the SPIRES syntax not only searching metadata is possible,
but also structured full-text queries are supported.
The INSPIRE HEP team gives search tips with a particular focus on the SPIRES syntax: https://inspirehep.net/info/hep/search-tips
hep_search
parses the resulting MARC XML and returns key metadata as
data.frame
with the following columns:
| Variable | Description |:----------------|:-----------------------------------------| |id |INSPIRE HEP ID | |title |title of work | |author |first author | |affiliation |first author affiliation (collapsed ";") | |doi |Digital Object Identifier (DOI) | |report_number |eprint id from the arXiv (collapsed ";") | |date_reported |date of initial appearance of preprint | |journal |journal title | |volume |journal volume | |issue |journal issue | |keywords |controlled keywords (collapsed ";") | |collection |collection information (collapsed ";") | |license_url |re-use terms for full texts (e.g. CC) |
library(dplyr) # for pipes and tbl_df class hep_search("witten black hole", limit = 5) %>% tbl_df()
INSPIRE HEP disambiguates author names. To search for an exact author, e.g. Dominik Schwarz and get the most frequent journals:
hep_search('exactauthor:D.J.Schwarz.1', limit = 250) %>% group_by(journal) %>% group_by(journal) %>% summarise(counts = n()) %>% arrange(desc(counts))
Search in arXiv eprints:
hep_search('find ft "faster than light"', limit = 5) %>% tbl_df()
By default, 100 records are returned for each query. The parameter limit
can
be used to control the number of records that you wish to retrieve.
To jump to a record, use the jrec
parameter. For example, you want records 20
to 29, tell hep_search
that you would like to start from record 20 and limit
your search results to 10 records.
hep_search('witten black hole', jrec = 20, limit = 10) %>% tbl_df()
Last but not least, you can use batch_size
to control the size of your
result pages. By default, batch_size
groups 10 records into a single page.
The maximum number is 250. Please note that large values per page could cause
longer response times. Consider the bulk download options via OAI-PMH or the
json data dump if you need to get to work with a large set of INSPIRE records.
INSPIRE HEP offers data dumps to support working with large amounts of INSPIRE
HEP records. To prevent memory problems, load the json dump incrementally with
the jsonlite::stream_in
function. The function supports custom handlers, so
you can apply your own function on the incoming stream.
Suppose we want to retrieve all cited HEP publications in 2015. In the example
of the INSPIRE HEP dump, load 500 records per iteration from the connection,
select the columns recid
, citations
and creation_data
, and filter out
records published in 2015. The resulting data.frame is saved as temporary
file, which can be loaded into R again.
library(dplyr) library(curl) library(jsonlite) con <- gzcon(curl("https://inspirehep.net/hep_records.json.gz")) output <- file(tmp <- tempfile(), open = "wb") stream_in(con, function(df){ df <- select(df, recid, citations, creation_date) df <- filter(df, grepl('2015', creation_date)) stream_out(df, output, verbose = FALSE) }) close(output) mydata <- stream_in(file(tmp)) tbl_df(mydata)
This strategy originates from this Jeroen Ooms talk on jsonlite and using Mongo DB
to be added
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Licence: MIT (c) Najko Jahn
For bug reports or feature requests please use the issue tracker.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.