knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
ArctosR is a package designed to download data from Arctos, format these data to make it easier for users to read and relate content, and save results in various formats. This vignette provides an overview of the basic usage of this package.
In order to use most of the functions in ArctosR, you will have to request an API key from Arctos. See https://handbook.arctosdb.org/documentation/api.html for instructions on how to do this.
Arctos contains specimen records and diverse types of data associated with them (e.g., measurements, locality coordinates and descriptions, tissue samples available, etc.). For a full description of Arctos and its data visit its website.
ArctosR abstracts interacting with the Arctos API via objects. These objects can be manipulated with a set of using friendly functions for most tasks, or created and manipulated directly using builder functions.
A query in ArctosR is a collection of searches by a user of the package for a specific task. This can be: (1) a simple search, such as a one-off search for specimens of a species held by some museum; (2) a search that requires the concatenation of multiple responses, such as one that requests more records than Arctos can provide in a single response; (3) or a complex search that uses requests for one set of search terms, then uses those responses as part of another request.
A response then is an object that stores the contents of one response from Arctos back to ArctosR. Each response has associated metadata, such as search terms and time-stamp, and content, such as a table of records matching those search terms.
Detailed metadata about each response is also saved in the user's query for the purposes of documentation and reproducibility. This metadata is saved along with downloaded records as a JSON file.
Make sure to load the package:
library(ArctosR)
ArctosR provides a single function, get_records
, to search for records in the
Arctos database. In order to start building a search, we have to find
out the possible query parameters we can use to search. These can be found with
the get_query_parameters
function, which returns a dataframe of all
query parameters used by Arctos. The names in the obj_name
column are what are
used as parameters to the get_records
function.
# run the function and store results in an object query_params <- get_query_parameters() # checking the dataframe obtained (showing only 6 rows and 3 columns) query_params[1:6,1:3] #> display obj_name category #> 1 Verbatim Date verbatim_date event #> 2 Collecting Method collecting_method event #> 3 Collecting Source collecting_source event #> 4 Ended Date ended_date event #> 5 Event Attributes evtAttributeSearchTable event #> 6 Habitat habitat event
For this basic query, we can use guid_prefix
, whose description can be found
by listing the row of the query parameter dataframe:
# checking row 37 in the dataframe query_params[37,1:5] #> display obj_name category subcategory #> 37 Collection guid_prefix identifier basic #> description #> 37 Collection responsible for the record. Turning this off will break most #> forms.
We will also use genus
and species
whose description can be found at rows
28 and 21, respectively:
query_params[28,1:5] #> display obj_name category subcategory #> 28 Genus genus identification curatorial #> description #> 28 Genus as provided in collection's preferred Source(s). query_params[23,1:5] #> display obj_name category subcategory #> 23 Species species identification curatorial #> description #> 23 Species (binomial) as provided in collection's preferred Source(s).
Now that we have a set of parameters to use, we can pass them to the get_records
function and send our request to Arctos. This will return a query, which
bundles our search parameters with the returned data from Arctos.
query <- get_records(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus", api_key=YOUR_API_KEY)
This simple search returned the default (core) columns provided by Arctos. There is a a lot more information that can be requested from Arctos, see below for an example of how to do it.
By default, the columns returned by Arctos are all of those with the category core. These columns can be listed as follows:
result_params <- get_result_parameters() result_params[result_params$category == 'core',1:2] #> display obj_name #> 1 GUID (DarwinCore Triplet) guid #> 7 Identified As scientific_name #> 43 Asserted Country country #> 44 Asserted State/Province state_prov #> 51 Specific Locality spec_locality #> 57 Verbatim Date verbatim_date #> 69 Decimal Latitude dec_lat #> 70 Decimal Longitude dec_long #> 71 Coordinate Error (m) coordinateuncertaintyinmeters
Additional columns can be requested by passing a vector of result parameters to
the get_records
function in the columns
parameter like so:
# making a list of additional columns to get (see get_query_parameters()) add_cols <- list("guid", "scientific_name", "relatedcatalogeditems", "collectors", "state_prov", "spec_locality", "dec_lat", "dec_long", "verbatim_date", "examined_for", "detected", "not_detected") # getting records with additional columns query <- get_records(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus", columns = add_cols, api_key=YOUR_API_KEY)
Certain result parameters (columns) in Arctos are entire tables associated to
a single specimen record. For instance, partdetail
, which links to the
attributes of each part listed in parts
associated with a specimen. These can
be requested just like any other result parameter. The information in these
complex columns is obtained in JSON format, but can be expanded into dataframes
of their own with the function expand_column
.
# defining the columns to be obtained some_cols <- list("guid", "parts", "partdetail") # performing the query query <- get_records(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus", columns = some_cols, api_key=YOUR_API_KEY)
See an example of expanding the columns in the section Expanding columns
By default, get_records
avoids requesting all records for a query unless
otherwise asked. By passing the parameter all_records = TRUE
to get_records
,
the user can request that ArctosR make multiple requests until all records for
a given query are downloaded.
get_record_count(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus") #> [1] 1694 query <- get_records(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus", all_records = TRUE, api_key=YOUR_API_KEY)
We have gone through the basic functionality of ArctosR. Below you can find an example of using the package to get, process, explore, and save data from Arctos.
# a list of columns to download with the query my_cols <- list("guid", "scientific_name", "parts", "collectors", "state_prov", "spec_locality", "dec_lat", "dec_long", "verbatim_date", "partdetail") # download records query <- get_records(guid_prefix = "MSB:Mamm", genus = "Canis", species = "lupus", columns = my_cols, api_key=YOUR_API_KEY) # getting only the dataframe of data msb_wolves <- response_data(query)
You can filter records by the presence or absence of certain data. For example,
to find rodents in the MSB:Mamm collection who have Orthohantavirus, you can
use the filter_by
argument to get_records
. filter_by
takes a list of
record attributes and attribute values. In this example, the record attribute
is "detected"
, and the attribute value is "Orthohantavirus"
.
orthohantavirus_MSB <- get_records(guid_prefix="MSB:Mamm", taxon_name=Rodentia, filter_by=list("detected"="Orthohantavirus"), api_key=YOUR_API_KEY)
The column partdetail
contains nested dataframes associated to each record. To
explore these data fully, we can process the information and view it as a table.
# process the information in partdetail into sub-dataframes expand_column(query, "partdetail")
ArctosR offers multiple options to save the data obtained from Arctos. See below for examples of how to do it.
save_response_csv(query, "msb_wolves.csv")
save_response_csv(query, "msb_wolves.csv", expanded = TRUE)
save_response_rds(query, "msb_wolves.rds")
save_response_csv(query, "msb_wolves2.csv", with_metadata = TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.