knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Databrary is a data library, one specialized for storing and sharing video with a restricted audience of institutionally approved researchers.
In this vignette, we'll see how to use databraryr
to access data.
We'll start simply. Let's download a test video from volume 1 on Databrary.
The download_video()
function handles this for us.
Running it with the default parameters downloads a simple test video with numbers
than increment.
The file is stored in a temporary directory created by the file system using the function tempdir()
.
The download_video()
function returns a character string with the full file name.
databraryr::download_video()
The function returns a path to the video on your file system.
The file name chosen, since we didn't specify one, is the session number 9807
, an underscore, the file/asset number 1
, another underscore, and a time stamp.
Note: You can navigate to the location where we're downloading files by opening the following URL in your browser: https://databrary.org/volume/1/slot/9807
Depending on your operating system, the following commands may open the file so that you can play it with your default video player.
nums_vid <- download_video() system(paste0("open ", nums_vid))
Or, you can navigate to the temporary directory to open and play the video manually.
Use tempdir()
to find the directory where test.mp4
is stored.
Now, let's see what other files are shared in volume 1, not just those in session (slot) 9807. This takes a moment to run because there are many files in this volume.
databraryr::list_volume_assets()
NOTE: These commands return public data, that is, data we do not need an account or log-in to see.
We have not provided an httr2
request parameter, so the function generates a default one.
We can see this happening if we set vb = TRUE
.
databraryr::list_volume_assets(vb = TRUE)
If we log-in using the commands described in authorized users, and provide the function with a valid (non-NULL) httr2
request parameter, the following function call would show files and data that are restricted to authorized users:
databraryr::list_volume_assets(vol_id = `<SOME_OTHER_VOLUME_ID>`, rq = rq)
Obviously, you would need to supply a vol_id
for some other non-public dataset for this to return useful information.
The list_volume_assets()
command returns a data frame we can manipulate using standard R commands.
Here are the variables in the data frame.
vol1_assets <- databraryr::list_volume_assets() names(vol1_assets)
Or, if you use the R (> version 4.3) 'pipe' syntax:
databraryr::list_volume_assets() |> names()
The magrittr
package pipe ('%>%') also works (as of databraryr v0.6.2).
The asset_format_id
variable tells us information about the type of the data file.
unique(vol1_assets$asset_format_id)
But this isn't especially informative since the asset_format
is a code, and we don't really know what -800
or 6
or the other numbers refer to.
To decode it, we create a data frame of all the file formats Databrary currently recognizes.
db_constants <- databraryr::assign_constants() formats_df <- purrr::map(db_constants$format, as.data.frame) |> purrr::list_rbind() formats_df
From this, we see that format -800
is an MP4-formatted video.
There are lots of those in Volume 1.
We see that6
is a PDF document.
As of 0.6.0, list_volume_assets()
adds this information to the data frame.
We can summarize the number of files using the stats::xtabs()
function:
stats::xtabs(~ format_name, data = vol1_assets)
So, there are lots of videos and PDFs to examine in volume 1. Here is a table of the ten longest videos.
vol1_assets |> dplyr::filter(format_name == "MPEG-4 video") |> dplyr::select(asset_name, asset_duration) |> dplyr::mutate(hrs = asset_duration/(60*60*1000)) |> dplyr::select(asset_name, hrs) |> dplyr::arrange(desc(hrs)) |> head(n = 10) |> knitr::kable(format = 'html')
Imagine you are interested in knowing more about this volume, the people who created it, or the agencies that funded it.
The list_volume_owners()
function returns a data frame with information about the people who created and "own" this particular dataset.
The function has a parameter this_vol_id
which is an integer, unique across Databrary, that refers to the specific dataset.
The list_volume_owners()
function uses volume 1 as the default.
databraryr::list_volume_owners()
The command (and many like it) can be "vectorized" using the purrr
package.
This let's us generate a tibble with the owners of the first fifteen volumes.
purrr::map(1:15, databraryr::list_volume_owners) |> purrr::list_rbind()
As of 0.6.0, the get_volume_by_id()
returns a list of all data about a volume that is accessible to a particular user.
The default is volume 1.
vol1_list <- databraryr::get_volume_by_id() names(vol1_list)
Let's create our own tibble/data frame with a subset of these variables.
vol1_df <- tibble::tibble(id = vol1_list$id, name = vol1_list$name, doi = vol1_list$creation, permission = vol1_list$permission) vol1_df
The permission
variable indicates whether a volume is visible by you, and if so with what privileges.
So, if you are not logged-in to Databrary, only data that are visible to the public will be returned.
Assuming you are not logged-in, the above commands will show volumes with permission
equal to 1.
The permission
field derives from a set of constants the system uses.
db_constants <- databraryr::assign_constants() db_constants$permission
The permission
array is indexed beginning with 0.
So the 1th (1st) value is "r db_constants$permission[2]
".
So, the 1
means that the volumes shown above are all visible to the public, and to you.
Volumes that you have not shared and are not visible to the public, will have permission
equal to 5, or "r db_constants$permission[6]
".
We can't demonstrate this to you because we don't have privileges on the same unshared volume, but you can try it on a volume you've created but not yet shared.
Other functions with the form list_volume_*()
provide information about Databrary volumes.
For example, the list_volume_funding()
command returns information about any funders listed for the project.
Again, the default volume is 1.
databraryr::list_volume_funding()
The list_volume_links()
command returns information about any external (web) links that have been added to a volume, such as to related publications or a GitHub repo.
databraryr::list_volume_links()
There's much more to learn about accessing Databrary information using databraryr
, but this should get you started.
As of 0.6.3, it's possible to download multiple files.
The following set of commands downloads all of the 'csv' files in volume 1 using the output from list_volume_session_assets()
(when you have a volume ID and a session ID) or list_session_assets()
when you have only the session ID.
The code below creates a new directory based on the session/slot ID (9807).
The function returns the file path to the downloaded files.
vol1_assets |> dplyr::filter(format_extension == "csv") |> databraryr::download_session_assets_fr_df()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.