In avucoh/Sourcery: Export HIRN resources

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

The HIRN Resource Browser has an API for both retrieving and submitting resources programmatically: https://resourcebrowser.hirnetwork.org/swagger/ui/index. For those who want to use R, this package is basically an R client that offers convenience functions, standardizes some steps, and provides examples of using the API for both purposes. For examples of how to submit resources, stay on this vignette. For retrieving resources, go to the "Retrieving Resources" vignette.

Compared to retrieving resources, submitting resources is more complicated in what is expected by the API. And in fact, some resources require interaction with the DKNet SciCrunch API in addition to the HIRN RB API. For those who want to submit lots of resources but do not want to spend time figuring out either APIs, this R Package helps astract some of that away. But Sourcery is also more than just an API package (for two different APIs) in that it's also meant to formalize an ETL process; this will be clear in the examples.

Note that to submit resources an API key is require, so first acquire an API key following the steps below.

TO-DO: include steps for getting an API key.

Set up

Start by loading the package and providing your apikey. We've stored our key as an environment variable (more info).

# library(Sourcery)
# apikey <- Sys.getenv("HIRN_RB_apikey")

Submitting resources

The user is assumed to be the working curator or researcher comfortable with R who wants to submit certain types for resources to the Resource Browser with the utilities provided by this package, which basically handle the extract, transform, load process through a helper app powered by API-interfacing functions. The package utilities are most suited for submitting batch data. For submitting only one or two records to the Resource Browser, it is probably still best to use the Resource Browser UI.

Examples are provided as the API interface is tested and becomes stable/recommended for general use, so not all resources may be included in this version. Also, examples vary by type of resource because the inputs and steps required are different; some involve a graphical helper app and others will involve only running a single line of code. The resources that have pretty analogous inputs and steps (such as Dataset and Technology) are grouped together and what is shown will generally apply to the others in the group.

The examples are ordered by complexity from least to most complex.

Protocol resources

TO-DO: Need to explain input file. Only require 1-2 lines of code as long as input file is correct.

Dataset and Technology resources

TO-DO: Need to explain input file. Only require 1-2 lines of code as long as input file is correct.

Cell/Tissue resources

ModelOrg resources

Antibody resources

The antibody inputs start with information that a researcher should already have in their notebook at minimum. If the antibodies don't at least include a reference catalog number, submission usually cannot proceed. A catalog number is required to search/register antibodies with AntibodyRegistry.org (more on this below).

# Content of example file without RRIDs
cat(readLines("ab_ex_no_RRID.txt"), sep = "\n")

These RRIDs are the identifiers assigned by AntibodyRegistry.org. Most antibody resources from a commercial vendor already have an RRID, but the curator/researcher must register any novel antibodies encountered/produced. Why do you need an RRID? We build upon dkNET’s framework for resource tracking, which makes use of Research Resource Identifiers (RRIDs). From https://dknet.org/about/rrid (where you can also learn more):

“RRIDs are persistent and unique identifiers for referencing a research resource and are used for promoting research resource identification and tracking. Catalog numbers change, disappear or can be reused for another resource, but RRIDs always resolve to the same research resource and endure beyond the existence of the research resource itself. It’s an easy and practical method for improving reproducibility, transparency and tracking.”

Importantly, RRIDs help us measure your resource impact. So ideally, we would have an RRID-complete file like this:

# Content of example file with RRIDs
cat(readLines("ab_ex_w_RRID.txt", n = 5), sep = "\n")

These inputs are very minimal, and the actual Resource Browser records will include 10+ other pieces of metadata. This is where the Sourcery Extract/Transform process comes in. Aside from the Resource Browser API, Sourcery is powered by the SciCrunch API. We use the SciCrunch API to extract and validate other pieces of resource data. Also, the Sourcery helper app guides through the user through contributing other metadata, such as ratings data.

First, open up the helper app.

modAntibodyApp()

The browser should launch as the screenshot below shows. alt text here

The working curator/researcher would either paste the the content of the input file into the text area or use the upload. Here, we are using the ab_ex_w_RRID.txt file mentioned above. The app will query ScriCrunch and return results with expanded data (not all of which are displayed.) alt text here

Go through and select the records that match those from SciCrunch's database. (If you have not registered your novel antibody, it will not be in the results!) After all results have been reviewed, go to the "Transform" tab. alt text here

Here we show original input data now transformed/augmented with the "Properties" data from the SciCrunch database. The user is allowed to fill in or correct certain values. alt text here

The "Usage" tab is where we gather more unique data for the resource. Certain values like dilution are recognized and filled in automatically if available. The user may specify whether they used this antibody to target a cell type using Cell Ontology classification. Once done, clicking on "Load" will export a file. TO DO: The tab should actually be changed to "Export".

alt text here

Next comes several lines of code to convert the export to a JSON file and post it as the body to the Resource Browser submission API.

# the complete JSON includes contributor information to the resource
payload <- ab2json("example_export.json")

# apikey is your user apikey
postValidateResource(apikey = apikey, body = payload)
# returns TRUE or FALSE
postLoadResource(apikey = apikey, body = payload)