rplos

knitr::opts_chunk$set(
  fig.path = "man/figures/",
  warning = FALSE,
  message = FALSE,
  collapse = TRUE,
  comment = "#>",
  fig.cap = ""
)

Project Status: Active – The project has reached a stable, usable state and is being actively developed. cran checks R-check codecov.io rstudio mirror downloads cran version

Install

You can get this package at CRAN here, or install it within R by doing

install.packages("rplos")

Or install the development version from GitHub

remotes::install_github("ropensci/rplos")
library("rplos")

What is this?

rplos is a package for accessing full text articles from the Public Library of Science journals using their API.

Information

You used to need a key to use rplos - you no longer do as of 2015-01-13 (or v0.4.5.999).

rplos vignetttes: https://docs.ropensci.org/rplos/

PLOS API documentation: http://api.plos.org/

PLOS Solr schema is at https://gist.github.com/openAccess/9e76aa7fa6135be419968b1372c86957 but is 1.5 years old so may not be up to date.

Crossref API documentation can be found at https://github.com/CrossRef/rest-api-doc. See also rcrossref (on CRAN) with a much fuller implementation of R functions for all Crossref endpoints.

Throttling

Beware, PLOS recently has started throttling requests. That is, they will give error messages like "(503) Service Unavailable - The server cannot process the request due to a high load", which means you've done too many requests in a certain time period. Here's what they say on the matter:

Please limit your API requests to 7200 requests a day, 300 per hour, 10 per minute and allow 5 seconds for your search to return results. If you exceed this threshold, we will lock out your IP address. If you're a high-volume user of the PLOS Search API and need more API requests a day, please contact us at api@plos.org to discuss your options. We currently limit API users to no more than five concurrent connections from a single IP address.

Quick start

Search

Search for the term ecology, and return id (DOI) and publication date, limiting to 5 items

searchplos('ecology', 'id,publication_date', limit = 5)

Get DOIs for full article in PLoS One

searchplos(q="*:*", fl='id', fq=list('journal_key:PLoSONE',
   'doc_type:full'), limit=5)

Query to get some PLOS article-level metrics, notice difference between two outputs

out <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'), fq='doc_type:full')
out_sorted <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'),
   fq='doc_type:full', sort='counter_total_all desc')
head(out$data)
head(out_sorted$data)

A list of articles about social networks that are popular on a social network

searchplos(q="*:*",fl=c('id','alm_twitterCount'),
   fq=list('doc_type:full','subject:"Social networks"','alm_twitterCount:[100 TO 10000]'),
   sort='counter_total_month desc')

Show all articles that have these two words less then about 15 words apart

searchplos(q='everything:"sports alcohol"~15', fl='title', fq='doc_type:full', limit=3)

Narrow results to 7 words apart, changing the ~15 to ~7

searchplos(q='everything:"sports alcohol"~7', fl='title', fq='doc_type:full', limit=3)

Remove DOIs for annotations (i.e., corrections) and Viewpoints articles

searchplos(q='*:*', fl=c('id','article_type'),
   fq=list('-article_type:correction','-article_type:viewpoints'), limit=5)

Faceted search

Facet on multiple fields

facetplos(q='alcohol', facet.field=c('journal','subject'), facet.limit=5)

Range faceting

facetplos(q='*:*', url=url, facet.range='counter_total_all',
 facet.range.start=5, facet.range.end=100, facet.range.gap=10)

Highlight searches

Search for and highlight the term alcohol in the abstract field only

(out <- highplos(q='alcohol', hl.fl = 'abstract', rows=3))

And you can browse the results in your default browser

highbrow(out)

highbrow

Full text urls

Simple function to get full text urls for a DOI

full_text_urls(doi='10.1371/journal.pone.0086169')

Full text xml given a DOI

(out <- plos_fulltext(doi='10.1371/journal.pone.0086169'))

Then parse the XML any way you like, here getting the abstract

library("XML")
xpathSApply(xmlParse(out$`10.1371/journal.pone.0086169`), "//abstract", xmlValue)

Search within a field

There are a series of convience functions for searching within sections of articles.

For example:

plossubject(q='marine ecology',  fl = c('id','journal'), limit = 10)

However, you can always just do this in searchplos() like searchplos(q = "subject:science"). See also the fq parameter. The above convenience functions are simply wrappers around searchplos, so take all the same parameters.

Search by article views

Search with term marine ecology, by field subject, and limit to 5 results

plosviews(search='marine ecology', byfield='subject', limit=5)

Visualize

Visualize word use across articles

plosword(list('monkey','Helianthus','sunflower','protein','whale'), vis = 'TRUE')

progress bars

res <- searchplos(q='*:*', limit = 2000, progress = httr::progress())
#>  |=====================================| 100%
#>  |=====================================| 100%
#>  |=====================================| 100%
#>  |=====================================| 100%

Meta


This package is part of a richer suite called fulltext, along with several other packages, that provides the ability to search for and retrieve full text of open access scholarly articles. We recommend using fulltext as the primary R interface to rplos unless your needs are limited to this single source.




ropensci/rplos documentation built on Sept. 12, 2022, 2:10 p.m.