knitr::opts_chunk$set(
    comment = "#>",
    collapse = TRUE,
    warning = FALSE,
    message = FALSE,
    fig.path = "figure/"
)

Introduction to rdpla

rdpla: R client for Digital Public Library of America

Digital Public Library of America brings together metadata from libraries, archives, and museums in the US, and makes it freely available via their web portal as well as an API. DPLA's portal and API don't provide the items themselves from contributing institutions, but they provide links to make it easy to find things. The kinds of things DPLA holds metadata for include images of works held in museums, photographs from various photographic collections, texts, sounds, and moving images.

DPLA has a great API with good documentation - a rare thing in this world. Further documentation on their API can be found on their search fields and examples of queries. Metadata schema information here.

DPLA API has two main services (quoting from their API docs):

rdpla also has an interface (dpla_bulk) to download bulk and compressed JSON data.

Note that you can only run examples/vignette/tests if you have an API key. See below for an example of how to get an API key.

Installation

Install from CRAN

install.packages("rdpla")

Development version

if (!requireNamespace("devtools")) {
  install.packages("devtools")
}
devtools::install_github("ropensci/rdpla")

Load rdpla

library("rdpla")

API key

If you already have a DPLA API key, make sure it's in your .Renviron or .Rprofile file.

If you don't have a DPLA API key, use the dpla_get_key() function to get a key. You only need a valid email address to get a key, for example:

dpla_get_key(email = "foo@bar.com")
#> API key created and sent via email. Be sure to check your Spam folder, too.

Search - items

Note: limiting fields returned for readme brevity.

Basic search

dpla_items(q="fruit", page_size=5, fields=c("provider","creator"))

Limit fields returned

dpla_items(q="fruit", page_size = 10, fields=c("publisher","format"))

Limit records returned

dpla_items(q="fruit", page_size=2, fields=c("provider","title"))

Search by date

dpla_items(q="science", date_before=1900, page_size=10, fields=c("id","date"))

Search on specific fields

dpla_items(description="obituaries", page_size=2, fields="description")
dpla_items(subject="yodeling", page_size=2, fields="subject")
dpla_items(provider="HathiTrust", page_size=2, fields="provider")

Spatial search, across all spatial fields

dpla_items(sp='Boston', page_size=2, fields=c("id","provider"))

Spatial search, by states

dpla_items(sp_state='Massachusetts OR Hawaii', page_size=2, fields=c("id","provider"))

Faceted search

dpla_items(facets=c("sourceResource.spatial.state","sourceResource.spatial.country"),
      page_size=0, facet_size=5)

Search - collections

Search for collections with the words university of texas

dpla_collections(q="university of texas", page_size=2)

You can also search in the title and description fields

dpla_collections(description="east")

Visualize

Visualize metadata from the DPLA - histogram of number of records per state (includes states outside the US)

out <- dpla_items(facets="sourceResource.spatial.state", page_size=0, facet_size=25)
library("ggplot2")
library("scales")
ggplot(out$facets$sourceResource.spatial.state$data, aes(reorder(term, count), count)) +
  geom_bar(stat="identity") +
  coord_flip() +
  theme_grey(base_size = 16) +
  scale_y_continuous(labels = comma) +
  labs(x="State", y="Records")


Try the rdpla package in your browser

Any scripts or data that you put into this service are public.

rdpla documentation built on May 2, 2019, 2:31 p.m.