fetch: Querying the BacDive API

View source: R/query.R

fetchR Documentation

Querying the BacDive API

Description

This package uses ‘bacdive_access’ objects for managing the access to the BacDive API. Once such an object has been created by applying open_bacdive, the API can be queried. The data are subject to the BacDive copyright (which is liberal, see the BacDive web site).

Usage

  fetch(object, ...)

  ## S3 method for class 'bacdive_access'
 fetch(object, ids, ...)

  request(object, ...)

  ## S3 method for class 'bacdive_access'
 request(object, query,
    search = c("taxon", "deposit", "16S", "genome"),
    page = 0L, ...)

  ## S3 method for class 'bacdive_access'
 retrieve(object, query,
    search = "taxon", ...)

  ## S3 method for class 'bacdive_access'
 upgrade(object, previous,
    keep = TRUE, ...)

Arguments

object

Object of class ‘bacdive_access’.

ids

Numeric vector or list containing such vectors. If empty, ... must contain at least one ID.

query

Atomic vector or list containing such vectors or lists. If empty, ... must yield a non-empty query. The conversion of query depends on the search argument.

search

Character vector of length 1 determining which search method to apply to the query, i.e. which API endpoint to use. Each endpoint has one search equivalent. Processed by match.arg. The search argument is passed from request to retrieve.

page

Integer vector of length 1. Needed because the results of request are paginated. The first page has the number 0.

previous

Object of class ‘bacdive_result’.

keep

Logical vector of length 1 that determines the return value of upgrade in case of failure.

...

For fetch, additional objects like ids. These are mandatory if and only if ids is empty.

For request and retrieve, additional arguments to be added to query. These are mandatory if and only if query is empty. When given, they must be named if advanced search is chosen. In the case of flexible search unnamed queries can be used but may just silently return nothing. Also note the possibility to use handler and sleep as arguments for retrieve (see the parent method).

For upgrade, optional arguments (currently ignored).

Details

The actual usage of ‘bacdive_access’ objects is demonstrated by querying the BacDive API. This is only possible for a user with a registered account. See open_bacdive for details.

A more detailed description of how to use advanced search and flexible search is given on the BacDive API web site. These search facilities may be augmented in the future without the need for changes to this client.

Forthcoming changes to the BacDive API are announced on the BacDive mailing list. Regular users of the API are advised to subscribe to this list.

Value

The methods for fetch, request and upgrade return an ‘bacdive_result’ object. In the case of request this object contains BacDive IDs. Each of them is used as a unique identifier by the API.

In contrast, fetch yields full data entries, given BacDive IDs.

upgrade yields the next data chunk of a paginated result. If there is no next one, if keep is TRUE, previous is returned, with a warning; otherwise NULL is returned.

For retrieve, see the documentation of the parent method. Note particularly the possibility to use handler and sleep as arguments.

By using request, fetch and upgrade, users can build their own loops to download and process paginated results, as an alternative to retrieve.

References

https://api.bacdive.dsmz.de/

https://bacdive.dsmz.de/about

https://bacdive.dsmz.de/mailinglist/subscribe

See Also

summary.dsmz_result retrieve print.dsmz_result as.data.frame

Other query.functions: open_bacdive

Examples

## Registration for BacDive is required but free and easy to accomplish.
## In real applications username and password could of course also be stored
## within the R code itself, or read from a file.
credentials <- Sys.getenv(c("DSMZ_API_USER", "DSMZ_API_PASSWORD"))

if (all(nzchar(credentials))) {

## create the BacDive access object
bacdive <- open_bacdive(credentials[[1L]], credentials[[2L]])
print(bacdive)
# it would be frustrating if the object was already expired on creation
stopifnot(
  inherits(bacdive, "bacdive_access"),
  !summary(bacdive)[c("expired", "refresh_expired")]
)

## fetch data, given some BacDive IDs
# (1) each ID given as separate argument
id1 <- fetch(bacdive, 624, 6583, 24493)
print(id1)
stopifnot(
  inherits(id1, "bacdive_result"),
  summary(id1)[["count"]] == 3L
)
# conversion to data frame is possible
id1d <- as.data.frame(id1)
head(id1d)
stopifnot(is.data.frame(id1d), nrow(id1d) == 3L)

# (2) all IDs in vector
id2 <- fetch(bacdive, c(624, 6583, 24493))
stopifnot(identical(id1, id2))

# (3) as above, but simplifying a list
id3 <- fetch(bacdive, list(624, 6583, 24493))
stopifnot(identical(id1, id3))

## search for culture collection numbers
ccn1 <- request(bacdive, "DSM 26640", "deposit")
print(ccn1)
stopifnot(
  inherits(ccn1, "bacdive_result"),
  summary(ccn1)[["count"]] == 1L
)
# conversion to data frame is possible
ccn1d <- as.data.frame(ccn1)
head(ccn1d)
stopifnot(is.data.frame(ccn1d), nrow(ccn1d) == 1L)

## search for 16S accession numbers
ssu1 <- request(bacdive, "AF000162", "16S")
print(ssu1)
stopifnot(
  inherits(ssu1, "bacdive_result"),
  summary(ssu1)[["count"]] == 1L
)
# conversion to data frame is possible
ssu1d <- as.data.frame(ssu1)
head(ssu1d)
stopifnot(is.data.frame(ssu1d), nrow(ssu1d) == 1L)

## search for genome accession numbers
gen1 <- request(bacdive, "GCA_006094295", "genome")
print(gen1)
stopifnot(
  inherits(gen1, "bacdive_result"),
  summary(gen1)[["count"]] == 1L
)
# conversion to data frame is possible
gen1d <- as.data.frame(gen1)
head(gen1d)
stopifnot(is.data.frame(gen1d), nrow(gen1d) == 1L)

## search for taxon names
# (1) given as length-1 character vector
bac1 <- request(bacdive,
  "Bacillus subtilis subsp. subtilis", "taxon")
stopifnot(
  inherits(bac1, "bacdive_result"),
  summary(bac1)[["count"]] > 200L
)
# conversion to data frame is possible but does not yield all
# entries if the result has a non-empty 'next' component
bac1d <- as.data.frame(bac1)
head(bac1d)
stopifnot(is.data.frame(bac1d))

# (2) given separately in character vector
bac2 <- request(bacdive,
  c("Bacillus", "subtilis", "subtilis"), "taxon")
stopifnot(identical(bac2, bac1))

## run search + fetch in one step
# (a) simple example for taxon names
bg1 <- retrieve(object = bacdive,
  query = "Bacillus subtilis subsp. subtilis",
  search = "taxon", handler = NULL, sleep = 0.1)
stopifnot(
  inherits(bg1, "records"),
  summary(bac1)[["count"]] >= length(bg1)
)
# conversion to data frame, here supposed to contain
# all entries
bg1d <- as.data.frame(bg1)
head(bg1d)
stopifnot(is.data.frame(bg1d),
  length(bg1) == nrow(bg1d))

# (b) something big
bg2 <- retrieve(object = bacdive, query = "Bacillus",
  search = "taxon", sleep = 0.1)
stopifnot(
  inherits(bg2, "records"),
  length(bg2) > 1000L
)

# (c) try a handler
bg2h <- list()
retrieve(object = bacdive, query = "Bacillus",
  search = "taxon", sleep = 0.1,
  handler = function(x) bg2h <<- c(bg2h, x))
stopifnot(length(bg2h) == length(bg2))
# there are of course better ways to use a handler

# (d) if nothing is found
nil <- retrieve(object = bacdive,
  query = "Thiscannotbefound")
stopifnot(length(nil) == 0L,
  inherits(nil, "records"))
# conversion to data frame
nild <- as.data.frame(nil)
head(nild)
stopifnot(is.data.frame(nild),
  length(nil) == nrow(nild))

## and finally a refresh, whether needed or not
refresh(bacdive, TRUE)
# this is also done internally and automatically
# in some situations when apparently needed

} else {

warning("username or password missing, cannot run examples")

}

BacDive documentation built on April 29, 2022, 3 a.m.

Related to fetch in BacDive...