knitr::opts_chunk$set ( collapse = TRUE, warning = TRUE, message = TRUE, width = 120, comment = "#>", fig.retina = 2, fig.path = "README-" )
"deposits" is an R package which provides a universal client for depositing and accessing research data in a variety of online deposition services. Currently supported services are zenodo and figshare. These two systems have fundamentally different interfaces ("API"s, or Application Programming Interfaces), and access to these and indeed all deposition services has traditionally been enabled through individual software clients. The deposits package aims to be a universal client offering access to a variety of deposition services, without users having to know any specific details of the APIs for each service. This vignette demonstrates how the deposits package can be used to manage the processes of uploading and publishing research data, using the methods summarised in Figure 1.
library (deposits)
The deposits package uses an R6
client to
interface with the individual deposition services. A separate
vignette
describes the R6
system for those unfamiliar with it.
An empty client can be constructed by naming the desired service. An additional
sandbox
parameter constructs a client to the zenodo
sandbox environment
intended for testing their API. Actual use of the zenodo
API can then be
enabled with the default sandbox = FALSE
.
cli <- depositsClient$new ("zenodo", sandbox = TRUE) cli #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : <none> #> #> hostdata : <none> #> metadata : <none>
Client construction requires personal access or authentication tokens for
deposits services to be stored as local environment variables, as described in
the installation and setup
document.
Authentication tokens are checked when new clients are constructed, so the
$new()
function will only succeed with valid tokens.
As also described in the
README
,
all methods of a deposits client can be seen with the deposits_methods()
method:
cli$deposits_methods () #> List of methods for a deposits client: #> #> - deposit_add_resource #> - deposit_delete #> - deposit_delete_file #> - deposit_download_file #> - deposit_embargo #> - deposit_fill_metadata #> - deposit_new #> - deposit_prereserve_doi #> - deposit_publish #> - deposit_retrieve #> - deposit_service #> - deposit_update #> - deposit_upload_file #> - deposit_version #> - deposits_list #> - deposits_methods #> - deposits_search #> #> see `?depositsClient` for full details of all methods.
# x <- capture.output (cli$deposits_methods ()) # num_methods <- length (grep ("\\-\\sdeposits\\_", x)) num_methods <- 3L
All methods are described in detail in the documentation entry for the
deposits
client.
All methods starting with the singular "deposit_" prefix operate on individual
deposits. The final r num_methods
methods starting with "deposits_" are
general methods applied to services in general ("list" and "search"), or to a
deposits client in general ("methods"). The main methods, and relationships
between them, are also illustrated in Figure 1.
The client constructed above is mostly empty, but nevertheless demonstrates the two primary fields or elements of a deposits client, the "hostdata" and "metadata". Both of these elements represent the "metadata" of a deposit, with the data itself referred to as "files", which can be uploaded and downloaded. These files also have accompanying metadata, according to the "frictionless" workflow as described in the separate "frictionless" vignette.
There are thus three types of metadata used in a deposits workflow:
frictionless
package. These kind of metadata are
described in the frictionless
vignette.The term "metadata" refers through this and all deposits documentation to the
first of these three kinds, with the second always explicitly referred to as
"frictionless metadata." The "metadata" and "frictionless metadata" structures
remain consistent between services, and allow data to be transformed from one
format to another, and between local clients and remote services. In contrast,
the hostdata
structures are directly provided by the deposits host services,
generally as lists, and with different structures for different services. These
structures are read-only fields which are automatically filled by the deposits
client, and are intended to provide insight into metadata records stored on
host sites.
A new deposit is initially constructed by filling the metadata
field with a
local representation of metadata. There are several ways of doing this, as
described in the separate metadata
vignette. One of
the easiest approaches is to define metadata as a simple list:
metadata <- list ( title = "New Title", abstract = "This is the abstract", creator = list (list (name = "A. Person"), list (name = "B. Person")) )
Note that the "creator" item has to be a list-of-lists, because aspects other
than name may also be included, and the second list is required to distinguish
different creators, as described in detail in the metadata
vignette. A new
deposits client can be filled with this metadata by passing it as the
metadata
parameter:
cli <- depositsClient$new ( service = "zenodo", sandbox = TRUE, metadata = metadata ) print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : <none> #> #> hostdata : <none> #> metadata : 3 terms (see 'metadata' element for details)
The summary produced by calling print()
(or, equivalently, just typing cli
in the console) says that the object now includes three metadata terms. They
can be seen by viewing cli$metadata
, confirming that the client metadata are
precisely what we specified:
cli$metadata
metadata <- list ( abstract = "This is the abstract", creator = list (list (name = "A. Person"), list (name = "B. Person")), title = "New Title" ) print (metadata)
Alternative ways of specifying and entering metadata are described in the metadata vignette, along with detailed descriptions of the kinds of metadata accepted by a deposits client.
Once filled with metadata, a deposits client can be used to initiate a new
deposit on the associated external service with the $deposit_new()
method.
The $deposit_new()
method
uses an existing client to create a new deposit on the nominated service,
whereas the the $new()
method
method creates a new client. Calling deposit_new()
from the client
constructed above with our sample metadata gives the following result:
cli$deposit_new () #> ID of new deposit: 1064327 print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> url_deposit : https://sandbox.zenodo.org/deposit/1064327 #> deposit id : 1064327 #> hostdata : list with 14 elements #> metadata : 4 terms (see 'metadata' element for details)
The client now lists one current deposit, additional fields for the URL and
"id" of the deposit, and has a "hostdata" field with 14 elements. The "ID"
value printed by the call to deposit_new()
is listed in the client as its
"deposit id". This is a unique integer value used to identify particular
deposits on external services. The value can be accessed any time as cli$id
.
The "metadata" item also includes an additional "identifier" element containing
a pre-reserved DOI provided by the deposits service.
From that point on, a client will always show (at least) one deposit. For example, if we return at some later time to a new R session and initiate a new, empty client, we would see the following result:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE) print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> hostdata : <none> #> metadata : <none>
This differs from our initial client in that it now lists one "current deposit".
We can examine a deposits client to get the "id" values of all current deposits. Extending from the previous example, the "id" can be accessed as:
cli$deposits$id #> [1] 1064327
More generally, information of all deposits currently associated with a user's
account (as identified by the token described in the installation
vignette)
can be accessed as cli$deposits
. With the single deposit show in the previous
steps, the first few fields of the result look this this:
cli$deposits [, 1:5]
data.frame ( conceptrecid = 1200932L, created = "2023-00-01T00:00:00", doi = "10.5072/zenodo.1064327", doi_url = "https://doi.org/10.5072/zenodo.1064327", id = 1064327L )
We can retrieve the metadata from this or any previously uploaded deposit with
the deposit_retrieve()
function:
cli$deposit_retrieve (deposit_id = cli$deposits$id [1])
The local client then holds identical information to the previous client
immediately after calling deposit_new()
- that is, retrieve_deposit()
has
filled the local client with all of the metadata from the previously-created
deposit.
The previous sections of this document describe how to initiate a deposits
client, and how to use that to initiate and retrieve metadata from a remote
deposits services. The main point of a deposit is of course to store actual
data in any arbitrary format alongside these structured metadata. This is
achieved with the deposit_upload_file()
method,
demonstrated in the following code which uses our deposit retrieved directly
above. It is recommended to store all data for a single deposit within a single
directory, which the following code also creates.
data_dir <- file.path (tempdir (), "data") dir.create (data_dir) path <- file.path (data_dir, "data.csv") write.csv (datasets::Orange, path, row.names = FALSE) cli$deposit_upload_file (path = path) #> frictionless metadata file has been generated as '/tmp/RtmpxSiYhW/data/datapackage.json'
The client then holds additional information which appears after typing
print(cli)
, or just cli
:
print (cli) #> <deposits client> #> deposits service : zenodo #> sandbox: TRUE #> url_base : https://sandbox.zenodo.org/api/ #> Current deposits : 1 (see 'deposits' element for details) #> #> url_deposit : https://sandbox.zenodo.org/deposit/1064327 #> deposit id : 1064327 #> hostdata : list with 14 elements #> metadata : 4 terms (see 'metadata' element for details) #> local_path : /tmp/RtmpxSiYhW/data #> resources : 1 local, 1 remote
The client now holds a local_path
field identifying the directory of the
active deposit, and lists numbers of both local and remote resources. The
details of the remote resources are contained in the hostdata$files
element
(which was previously empty):
cli$hostdata$files #> checksum filename filesize id #> 1 cc624d72ede85ef061afa494d9951f6f data.csv 625 56c44dd6-5f84-4212-9a65-d37f64ca886f #> 2 eaeb7c4f8a931c99e662172299a0b17f datapackage.json 812 32d556ef-5b65-4b9d-a8a8-2e7bed11da5d #> links.download #> 1 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/data.csv #> 2 https://sandbox.zenodo.org/api/files/561f4971-9e86-4235-b574-f5662f6088e3/datapackage.json #> links.self #> 1 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/56c44dd6-5f84-4212-9a65-d37f64ca886f #> 2 https://sandbox.zenodo.org/api/deposit/depositions/1161632/files/32d556ef-5b65-4b9d-a8a8-2e7bed11da5d
The list of files includes a "datapackage.json" file generated by the
frictionless
package. This file is
not counted in "resources". As described in the main
README, and at length in the
separate "frictionless"
vignette, the
"datapackage.json" file contains both the metadata entered in to the deposits
client, as well as "frictionless metadata" describing the internal properties
of the dataset itself.
Files can be downloaded with the deposit_download_file
function.
To demonstrate how that works, the following code first removes the local
version, then downloads it from the remote service and confirms that a local
version has been successfully re-created.
file.remove (path) file <- cli$deposit_download_file (filename = "data.csv", path = data_dir) file #> [1] /tmp/RtmpcO59N8/data/data.csv
The workflow described in the preceding section results in a frictionless
metadata file being simultaneously generated, filled with deposits metadata,
and uploaded to the nominated service.
As described in detail in the "frictionless"
vignette. An
alternative workflow allows frictionless metadata files to be generated locally
prior to any uploading. This uses the deposits_add_resource()
method,
where a "resource" is a local data file or object.
After initiating a client with metadata, as demonstrated above:
cli <- depositsClient$new (service = "zenodo", sandbox = TRUE, metadata = metadata)
A frictionless metadata file which is only stored locally can then be generated by the following call, by specifying a path to that local file.
cli$deposit_add_resource (path = path)
The client will then list an additional local_path
, as demonstrated above,
and in this case will list resources: 1 local, 0 remote
, because the resource
has not yet been uploaded to the remote service. The local_path
directory
containing the specified file will also have an additional "datapackage.json"
file including the deposits metadata used in client construction. This file may
be edited as desired prior to uploading. To update a deposits client with
changes to external metadata files, simply pass the path to that file to the
deposits_fill_metadata()
method.
When ready, a single call to the deposit_upload_file()
function
will upload the file specified in that call, along with the frictionless
"datapackage.json" metadata file.
All deposits are initiated on the nominated services as "private" deposits, meaning:
A deposit can only be publicly viewed once it has been published, as described in the final section of this vignette. The process of using deposits to prepare one or more datasets for publication will generally involve multiple stages of editing and updating.
Once a deposits client has been filled with metadata and connected to a
local_path
, as demonstrated above, any of the local files may be edited,
including the frictionless "datapackage.json" file. The client and the deposit
held on the remote server may then be updated by calling the
deposit_update()
method.
Any changes to the "metadata" field of the "datapackage.json" file will be
reflected in the "metadata" field of the deposits client, as well as in the
metadata passed to the remote service. Any modified files, including
"datapackage.json", will also be uploaded to the remote service, over-writing
previous versions.
Note that local files must first be individually uploaded with the the
deposit_upload_file()
method
before the deposit_update()
method
can be used to update them. Moreover, calling
deposit_update()
before all files held in the local_path
directory have been uploaded will
generally produce an error noting that all files must first be uploaded prior
to calling
deposit_update()
.
An example of a full workflow for creating and editing a deposits client and associated metadata would look something like the following five main steps:
Initiate local deposits client with metadata:
```r metadata <- list ( title = "New Title", abstract = "This is the abstract", creator = list (list (name = "A. Person"), list (name = "B. Person")) ) cli <- depositsClient$new ( service = "zenodo", sandbox = TRUE, metadata = metadata ) cli$deposit_new ()
2. Upload local data, which the following code simulates by creating a "dummy"
dataset in the temporary directory of the current R session:
r
data_dir <- file.path (tempdir (), "data")
dir.create (data_dir)
path <- file.path (data_dir, "data.csv")
write.csv (datasets::Orange, path, row.names = FALSE)
The following call then uploads that dataset to the newly-created deposit:
r
cli$deposit_upload_file (path = path)
``
Calling
deposit_upload_file()the first time also creates local and
remote versions of a frictionless "datapackage.json" file, holding all
metadata, and the DOI of the new deposit. Uploading files also
automatically generates the
local_path` field in the deposits client,
enabling numbers of local and remote resources to be counted and shown when
printing the client.
3. Modify metadata. The following code provides a proof-of-principle
modification of metadata, by changing "New Title" to "Updated Title":
```r fr <- file.path (data_dir, "datapackage.json") dp <- frictionless::read_package (fr) dp$metadata$title
dp$metadata$title <- "Updated Title"
frictionless::write_package (dp, data_dir)
This is an indirect way of editing metadata, by using R code. The
recommended way to update deposits metadata is to directly edit and modify
the "datapackage.json" file.
4. Update both local client and remote deposit data, noting that the
`local_path` variable is held in the client itself, so does not need to be
passed to the update method.
r
cli$deposit_update ()
cli$metadata$title
cli$hostdata$title
``` Local modifications are reflected in both updated "metadata" with the deposits client, as well as in "hostdata" stored on the Zenodo service.
Once all metadata and data have been satisfactorily edited, updated, and uploaded, a deposit can be made publicly visible and permanently associated with a Digital Object Identifier (DOI) by publishing it. Prior to publishing, it is often desired to apply an "embargo" to the deposit, in the form of a date after which the deposit will become publicly visible. The two steps to publication are thus generally:
cli$deposit_embargo (embargo_date = "2030-03-30") cli$deposit_publish ()
Calling the deposit_publish()
method
is irreversible, and can never be undone. (Publication is permanent even in the
Zenodo sandbox environment.) The published deposit will be permanently
associated with the account of the user who published it, as identified by the
API token used to initiate the deposits
client.
Publication will also change many items of the client's "hostdata", notably
involving a change of status or visibility from "private" to "public". Once a
deposit has been published, the associated DOI, or equivalent the URL given in
the deposits client, may be shared as a permanent link to the deposit.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.