knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE, cache.path = "inst/cache/" )
cat(paste(" -", paste(getNamespaceExports("pubchunks"), collapse = "\n - ")))
The main workhorse function is pub_chunks()
. It allows you to pull out sections of articles from many different publishers (see next section below) WITHOUT having to know how to parse/navigate XML. XML has a steep learning curve, and can require quite a bit of Googling to sort out how to get to different parts of an XML document.
The other main function is pub_tabularize()
- which takes the output of pub_chunks()
and coerces into a data.frame for easier downstream processing.
If you know of other publishers or sources that provide XML let us know by opening an issue.
We'll continue adding additional publishers.
Stable version
install.packages("pubchunks")
Development version from GitHub
remotes::install_github("ropensci/pubchunks")
Load library
library('pubchunks')
x <- system.file("examples/10_1016_0021_8928_59_90156_x.xml", package = "pubchunks")
pub_chunks(x, "abstract") pub_chunks(x, "title") pub_chunks(x, "authors") pub_chunks(x, c("title", "refs"))
The output of pub_chunks()
is a list with an S3 class pub_chunks
to make
internal work in the package easier. You can easily see the list structure
by using unclass()
.
xml <- paste0(readLines(x), collapse = "") pub_chunks(xml, "title")
xml <- paste0(readLines(x), collapse = "") xml <- xml2::read_xml(xml) pub_chunks(xml, "title")
install.packages("fulltext")
library("fulltext") x <- fulltext::ft_get('10.1371/journal.pone.0086169') pub_chunks(fulltext::ft_collect(x), sections="authors")
x <- system.file("examples/elife_1.xml", package = "pubchunks") res <- pub_chunks(x, c("doi", "title", "keywords")) pub_tabularize(res)
library(rcrossref) library(dplyr) res <- cr_works(filter = list( full_text_type = "application/xml", license_url="http://creativecommons.org/licenses/by/4.0/")) links <- bind_rows(res$data$link) %>% filter(content.type == "application/xml") download.file(links$URL[1], (i <- tempfile(fileext = ".xml"))) pub_chunks(i) download.file(links$URL[13], (j <- tempfile(fileext = ".xml"))) pub_chunks(j) download.file(links$URL[20], (k <- tempfile(fileext = ".xml"))) pub_chunks(k)
unlink(i) unlink(j) unlink(k)
pubchunks
: citation(package = 'pubchunks')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.