Specimen records constitute the core of data served by the NBA. Museum specimens can represent a whole variety of different objects such as plants, animals or single parts thereof, DNA samples, fossils, rocks or meteorites. For detailed information of the data model, please refer to the official documentation in the NBA.
All specimen occurrence services are accessible using the methods
within the SpecimenClient
class. For a list of available endpoints,
please refer to the class documentation (?Specimen
). Below, we will
give details about all services, grouped by category.
Querying for specimens is accomplished with the query
method in the
class SpecimenClient
. For simple queries, query parameters of type
list
can be passed via the parameter queryParams
, for example we
can query specimens of the Family Ebenaceae that were collected in
Europe:
library('nbaR') # instantiate specimen client sc <- SpecimenClient$new() # specify query params in named list qp <- list( identifications.defaultClassification.family = "Ebenaceae", gatheringEvent.continent = "Europe" ) # query res <- sc$query(queryParams = qp)
If we now want to know all the countries that the specimens were
collected in, we can access the Specimen
objects in
res$content$resultSet
as follows:
sapply(res$content$resultSet, function(x)x$item$gatheringEvent$country)
Note that passing query parameters as a named list only allows for
limited queries; the logical conjunction between parameters is for
example always AND
. More complex queries can be accomplished using
the QuerySpec
object:
# get all specimens with genus name starting with 'Hydro' qc <- QueryCondition$new(field = "identifications.defaultClassification.genus", operator = "STARTS_WITH", value = "Hydro") qs <- QuerySpec$new(conditions = list(qc)) res <- sc$query(qs)
The query
function is limited to retrieve 50000 specimen at once
(this is determined in the parameter index.max_result_window
, the
value is retrievable using the getSettings
method in the metadata
section). In order to provide access for a larger amount of data, the
query_download
takes the same arguments as query
, but download the
data as a gzip stream under the hood. Unlike query
,
query_download
returns a list of specimen objects instead of a
ResultSet
. Example:
## get the first 100000 specimen objects (not possible with query method) res <- sc$download_query(QuerySpec$new(size=100000))
Several access methods offer the convenient retrieval of specimens matching a certain identifier or being part of a certain collection. Below we give examples of how to use the currently implemented data access services for specimen records:
Some of the specimens available via the NBA are categorised
thematically into special collections, such as the Siebold-, Dubois-
or Jongmans collection. The function get_named_collection
lists all
available special collections and the identifiers of the specimens
within a collection can be queries with get_ids_in_collection
sc$get_named_collections()$content sc$get_ids_in_collection("siebold")$content
For any given query (with QuerySpec
or not), returns the count of
matches instead of specimen objects:
# Example with QuerySpec: # how many specimens are there in the 'Botany' collection? qc <- QueryCondition$new(field='collectionType', operator='EQUALS', value='Botany') qs <- QuerySpec$new(conditions=list(qc)) # get the number of specimens sc$count(qs)$content
Note that the count of matches for a given query is also returned by
the query
function. However, count
is more lightweight as it
returns an integer
instead of a ResultSet
containing Specimen
objects.
Check if a record exists, based on its unitID:
# use SpecimenClient instantiated above res <- sc$exists('ZMA.INS.1255440') # content is boolean res$content
Return a single specimen given its identifier (Note: the identifier of a specimen is different from the unitID, see also here):
id <- "RMNH.MAM.17209.B@CRS" res <- sc$find(id) # content is single specimen object res$content
Same as find
, but takes multiple IDs:
ids <- "RMNH.INS.657083@CRS,L.1589244@BRAHMS" res <- sc$find_by_ids(ids)
Find a specimen by its unitID
:
unitID <- "RMNH.MAM.1513" res <- sc$find_by_unit_id(unitID)
Aggregation services group available data according to different criteria.
This method takes a specific field as an argument and returns all possible values and the frequency for that field in the data. Below we get all possible values for the country in which a specimen was collected.
sc$get_distinct_values("gatheringEvent.country")
Note: By default, get_distinct_values
lists only the first 10
hits. The above query thus does not reflect the distinct values in the
hole dataset. This number can be increased with e.g. setting the
size
parameter in a QuerySpec
object passed to the method.
sc$get_distinct_values("gatheringEvent.country", querySpec = QuerySpec$new(size = 10000))
Instead of returning all different values for a given field, this method does a mere count:
sc$count_distinct_values("gatheringEvent.country")
Suppose you want to add another filter to the above query of retrieving all distinct values for a given field, such as splitting the results by their respective source system:
sc$get_distinct_values_per_group("sourceSystem.name", "gatheringEvent.country")
If no return of the actual values is needed, a simple count per group is done as follows:
sc$count_distinct_values_per_group("sourceSystem.name", "gatheringEvent.country")$content
Specimen Metadata services include the same standard metadata services as for the other data types:
# get all paths for the Specimen datatype sc$get_paths()$content # get info e.g. for field collectionType sc$get_field_info()$content$collectionType # get all settings sc$get_settings() # get specific setting sc$get_setting("index.max_result_window")$content # check if operator is allowed sc$is_operator_allowed("gatheringEvent.continent", "STARTS_WITH")$content
All fields can be retrieved with get_paths
and specific information
on the fields, such as allowed operators etc. with get_field_info
sc$get_paths()$content sc$get_field_info()$content
Multimedia-specific settings can be retrieved with get_settings
and
a specific setting with get_setting
:
sc$get_settings()$content sc$get_setting("index.max_result_window")$content
To test if a certain operator can be used for a multimedia query:
sc$is_operator_allowed("identifications.defaultClassification.genus", "STARTS_WITH")$content
In addition to query services that return JSON formatted data, the NBA also offers the export of Darwin Core Archive (DwCA) files. These files are by default zip archives, please refer to our official API documentation for more information.
Static download services offer the download of predefined datasets.
The sets that are available for download can be queried with
dwca_get_data_set_names
:
sc$dwca_get_data_set_names()$content
A dataset can then be downloaded using dwca_get_data_set
. A filename
can be given as argument, if none is given, the DwCA archive is
written to download-YYYY-MM-DDThh:mm.zip in the current working
directory.
# download dataset 'porifera' to temporary file filename <- tempfile(fileext=".zip") sc$dwca_get_data_set('porifera', filename=filename)
The dynamic download function dwca_query
allows for download of
arbitrary sets, defined by the user's query. The arguments to this
methods are similar to query
, plus the filename:
# download all specimen of genus 'Hydrochoerus' filename <- tempfile(fileext = ".zip") qs <- QuerySpec$new(conditions = list( QueryCondition$new( field = "identifications.defaultClassification.genus", operator = "EQUALS", value = "Hydrochoerus" ) )) sc$dwca_query(querySpec = qs, filename = filename)
Query for taxon document with given search criteria. Example:
# query for taxa of genus 'Sedum' that are in the Netherlands Soortenregister (NSR) tc <- TaxonClient$new() qc <- QueryCondition$new(field = "acceptedName.genusOrMonomial", operator = "EQUALS", value = "Sedum") qc2 <- QueryCondition$new(field = "sourceSystem.code", operator = "EQUALS", value = "NSR") qs <- QuerySpec$new(conditions = list(qc, qc2)) tc$query(qs)
The query
function is limited to retrieve 50000 taxa at once (this
is determined in the parameter index.max_result_window
, the value is
retrievable using the getSettings
method in the metadata
section). In order to provide access for a larger amount of data, the
query_download
takes the same arguments as query
, but download the
data as a gzip stream under the hood. Unlike query
,
query_download
returns a list of taxon objects instead of a
ResultSet
.
For a given query, do not return Taxon
objects but the mere
count. Example
# get counts for taxa of genus 'Sedum' that are in the # Netherlands Soortenregister (NSR) qc <- QueryCondition$new(field = "acceptedName.genusOrMonomial", operator = "EQUALS", value = "Sedum") qc2 <- QueryCondition$new(field = "sourceSystem.code", operator = "EQUALS", value = "NSR") qs <- QuerySpec$new(conditions = list(qc, qc2)) tc$count(qs)
Returns a taxon object given its identifier:
tc$find("27706109@COL")$content
Given a string with comma-separated identifiers, returns a list of taxon objects:
ids <- "27706109@COL,27704140@COL,27706110@COL,27706111@COL,27706108@COL" res <- tc$find_by_ids(ids)
This method takes a specific field as an argument and returns all possible values and the frequency for that field in the data. Example: get all data source systems for taxon objects:
tc$get_distinct_values("sourceSystem.name")$content
Taxon Metadata services include the same standard metadata services as for the other data types:
# get all paths for the Taxon datatype tc$get_paths()$content # get info e.g. for field collectionType tc$get_field_info()$content$synonyms.author # get all settings tc$get_settings() # get specific setting tc$get_setting("index.max_result_window")$content # check if operator is allowed tc$is_operator_allowed("synonyms.author", "EQUALS")$content
All fields can be retrieved with get_paths
and specific information
on the fields, such as allowed operators etc. with get_field_info
tc$get_paths()$content tc$get_field_info()$content
Multimedia-specific settings can be retrieved with get_settings
and
a specific setting with get_setting
:
tc$get_settings()$content tc$get_setting("index.max_result_window")$content
To test if a certain operator can be used for a mutimedia query:
tc$is_operator_allowed("identifications.defaultClassification.genus", "STARTS_WITH")$content
The taxonomic information in the NBA is also available as Darwin-Core archive files.
Static download services offer the download of predefined datasets.
The sets that are available for download can be queried with
dwca_get_data_set_names
:
tc$dwca_get_data_set_names()$content
A dataset can then be downloaded using dwca_get_data_set
. A filename
can be given as argument, if none is given, the DwCA archive is
written to download-YYYY-MM-DDThh:mm.zip in the current working
directory.
# download dataset 'nsr' to temporary file filename <- tempfile(fileext=".zip") tc$dwca_get_data_set('nsr', filename=filename)
The dynamic download function dwca_query
allows for download of
arbitrary sets, defined by the user's query. The arguments to this
methods are similar to query
, plus the filename:
# download all taxa for genus 'Clematis' filename <- tempfile(fileext = ".zip") qs <- QuerySpec$new(conditions = list( QueryCondition$new( field = "defaultClassification.genus", operator = "EQUALS", value = "Clematis" ) )) tc$dwca_query(querySpec = qs, filename = filename)
The GeoArea query service allows for detailed search within the fields
of a GeoArea object. As for the other data types, query parameters can
be either given as a list
or as a QuerySpec
object. Below, we
make a simple query to get the GeoArea
object for the Netherlands:
# instantiate client for geo areas gc <- GeoClient$new() # query for GeoArea of the Netherlands qc <- QueryCondition$new(field = "locality", operator = "EQUALS", value = "Netherlands") qs <- QuerySpec$new(conditions = list(qc)) res <- gc$query(qs) # get item res$content$resultSet[[1]]$item
This is a convenience function to directly extract the a GeoJSON object for a specific locality. GeoJSON is a popular format for storing geographical point- and polygon data. To e.g. extract the GeoJSON polygon representation for the Netherlands:
loc <- "Netherlands" res <- gc$get_geo_json_for_locality(loc)
Results are returned as a list
by default, but can be easily converted to
a JSON string, e.g.
jsonlite::toJSON(res$content)
For any given query (with QuerySpec
or not), returns the count of
matches instead of GeoArea
objects:
# return count of all GeoAreas res <- gc$count() res$content
This function returns all values present for a certain field, and their counts:
gc$get_distinct_values("areaType")$content
Geo Metadata services include the same standard metadata services as for the other data types:
# get all paths for the GeoArea datatype gc$get_paths()$content # get info e.g. for field 'areaType' gc$get_field_info()$content$areaType # get all settings gc$get_settings() # get specific setting gc$get_setting("index.max_result_window")$content # check if operator is allowed gc$is_operator_allowed("locality", "STARTS_WITH")$content
Multimedia services are accessible with a MultimediaClient
,
instantiated as follows:
mc <- MultimediaClient$new()
As for the other data types, the query
method enables simple and
complex queries using a list or a QuerySpec
object to specify query
parameters.
# example of multimedia query passing parameters as a list mc$query(queryParams = list(collectionType = 'Cnidaria'))$content # example of multimedia query using QuerySpec: get the first 100 # multimedia items associated with a specimen with name starting with "Ba" qc <- QueryCondition$new(field = "identifications.scientificName.fullScientificName", operator = "STARTS_WITH", value = "Qu") qs <- QuerySpec$new(conditions = list(qc), size = 100) res <- mc$query(qs) # check if scientific names indeed start with 'Qu' sapply(res$content$resultSet, function(x) x$item$identifications[[1]]$scientificName$fullScientificName)
As for the other data types, a count
function returns counts instead
of the actual objects:
# count all multimedia documents mc$count()$content
This function returns all values present for a certain field, and their counts. Example: retrieve all different licenses and their counts"
mc$get_distinct_values("license")$content
All fields can be retrieved with get_paths
and specific information
on the fields, such as allowed operators etc. with get_field_info
mc$get_paths()$content mc$get_field_info()$content
Multimedia-specific settings can be retrieved with get_settings
and
a specific setting with get_setting
:
mc$get_settings()$content mc$get_setting("index.max_result_window")$content
To test if a certain operator can be used for a mutimedia query:
mc$is_operator_allowed("identifications.scientificName.fullScientificName", "STARTS_WITH")$content
Metadata services provide miscellaneous information about the data
available via the NBA. Note that there is also type-specific metadata
for each data type (e.g. Specimen
) which can be retrieved with the
specific client of that class. Here we show the available methods for
the MetadataClient
which gives general, non-type specific metadata.
The client is instantiated in the standard way:
mc <- MetadataClient$new()
The vocabularies for some fields are controlled by dictionaries with allowed values. For the sex of a museum specimen, for instance, only the terms male, female, mixed and hermaphrodite are allowed to be assigned to the specimen. The fields for which controlled lists are available can be retrieved as follows:
mc$get_controlled_lists()$content
and for each field that has a controlled vocabulary, there is a separate function to retrieve the allowed values:
mc$get_controlled_list_taxonomic_status()$content mc$get_controlled_list_specimen_type_status()$content mc$get_controlled_list_sex()$content mc$get_controlled_list_phase_or_stage()$content
To maintain data integrity, dates have to be coded in specific formats in our systems. For instance, yyyy-MM-dd is a valid format. Allowed formats can be retrieved as follows:
mc$get_allowed_date_formats()$content
The method get_rest_services
returns a list of all services in the
NBA as objects of type RestService
.
# get the endPoint of the first rest service in the services list mc$get_rest_services()$content[[1]]
Similar to the document-specific metadata services, we can get general
settings with the MetaDataClient
:
#get all settings mc$get_settings()$content # get value for specific setting mc$get_setting("operator.contains.min_term_length")$content
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.