Accessing Data
In databraryr: Interact with the 'Databrary.org' API

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Databrary is a data library, one specialized for storing and sharing video with a restricted audience of institutionally approved researchers. In this vignette, we'll see how to use databraryr to access data.

We'll start simply. Let's download a test video from volume 1 on Databrary.

The download_video() function handles this for us. Running it with the default parameters downloads a simple test video with numbers than increment. The file is stored in a temporary directory created by the file system using the function tempdir(). The download_video() function returns a character string with the full file name.

databraryr::download_video()

The function returns a path to the video on your file system. The file name chosen, since we didn't specify one, is the session number 9807, an underscore, the file/asset number 1, another underscore, and a time stamp.

Note: You can navigate to the location where we're downloading files by opening the following URL in your browser: https://databrary.org/volume/1/slot/9807

Depending on your operating system, the following commands may open the file so that you can play it with your default video player.

nums_vid <- download_video()
system(paste0("open ", nums_vid))

Or, you can navigate to the temporary directory to open and play the video manually. Use tempdir() to find the directory where test.mp4 is stored.

Now, let's see what other files are shared in volume 1, not just those in session (slot) 9807. This takes a moment to run because there are many files in this volume.

databraryr::list_volume_assets()

NOTE: These commands return public data, that is, data we do not need an account or log-in to see. We have not provided an httr2 request parameter, so the function generates a default one. We can see this happening if we set vb = TRUE.

databraryr::list_volume_assets(vb = TRUE)

If we log-in using the commands described in authorized users, and provide the function with a valid (non-NULL) httr2 request parameter, the following function call would show files and data that are restricted to authorized users:

databraryr::list_volume_assets(vol_id = `<SOME_OTHER_VOLUME_ID>`, rq = rq)

Obviously, you would need to supply a vol_id for some other non-public dataset for this to return useful information.

The list_volume_assets() command returns a data frame we can manipulate using standard R commands. Here are the variables in the data frame.

vol1_assets <- databraryr::list_volume_assets()
names(vol1_assets)

Or, if you use the R (> version 4.3) 'pipe' syntax:

databraryr::list_volume_assets() |>
  names()

The magrittr package pipe ('%>%') also works (as of databraryr v0.6.2).

The asset_format_id variable tells us information about the type of the data file.

unique(vol1_assets$asset_format_id)

But this isn't especially informative since the asset_format is a code, and we don't really know what -800 or 6 or the other numbers refer to. To decode it, we create a data frame of all the file formats Databrary currently recognizes.

db_constants <- databraryr::assign_constants()

formats_df <- purrr::map(db_constants$format, as.data.frame) |>
  purrr::list_rbind()

formats_df

From this, we see that format -800 is an MP4-formatted video. There are lots of those in Volume 1. We see that6 is a PDF document.

As of 0.6.0, list_volume_assets() adds this information to the data frame.

We can summarize the number of files using the stats::xtabs() function:

stats::xtabs(~ format_name, data = vol1_assets)

So, there are lots of videos and PDFs to examine in volume 1. Here is a table of the ten longest videos.

vol1_assets |>
  dplyr::filter(format_name == "MPEG-4 video") |>
  dplyr::select(asset_name, asset_duration) |>
  dplyr::mutate(hrs = asset_duration/(60*60*1000)) |>
  dplyr::select(asset_name, hrs) |>
  dplyr::arrange(desc(hrs)) |>
  head(n = 10) |>
  knitr::kable(format = 'html')

Accessing metadata

Imagine you are interested in knowing more about this volume, the people who created it, or the agencies that funded it.

The list_volume_owners() function returns a data frame with information about the people who created and "own" this particular dataset. The function has a parameter this_vol_id which is an integer, unique across Databrary, that refers to the specific dataset. The list_volume_owners() function uses volume 1 as the default.

databraryr::list_volume_owners()

The command (and many like it) can be "vectorized" using the purrr package. This let's us generate a tibble with the owners of the first fifteen volumes.

purrr::map(1:15, databraryr::list_volume_owners) |> 
  purrr::list_rbind()

As of 0.6.0, the get_volume_by_id() returns a list of all data about a volume that is accessible to a particular user. The default is volume 1.

vol1_list <- databraryr::get_volume_by_id()
names(vol1_list)

Let's create our own tibble/data frame with a subset of these variables.

vol1_df <- tibble::tibble(id = vol1_list$id,
                          name = vol1_list$name,
                          doi = vol1_list$creation,
                          permission = vol1_list$permission)
vol1_df

The permission variable indicates whether a volume is visible by you, and if so with what privileges.

So, if you are not logged-in to Databrary, only data that are visible to the public will be returned. Assuming you are not logged-in, the above commands will show volumes with permission equal to 1. The permission field derives from a set of constants the system uses.

db_constants <- databraryr::assign_constants()
db_constants$permission

The permission array is indexed beginning with 0. So the 1th (1st) value is "r db_constants$permission[2]". So, the 1 means that the volumes shown above are all visible to the public, and to you.

Volumes that you have not shared and are not visible to the public, will have permission equal to 5, or "r db_constants$permission[6]". We can't demonstrate this to you because we don't have privileges on the same unshared volume, but you can try it on a volume you've created but not yet shared.

Other functions with the form list_volume_*() provide information about Databrary volumes. For example, the list_volume_funding() command returns information about any funders listed for the project. Again, the default volume is 1.

databraryr::list_volume_funding()

The list_volume_links() command returns information about any external (web) links that have been added to a volume, such as to related publications or a GitHub repo.

databraryr::list_volume_links()

There's much more to learn about accessing Databrary information using databraryr, but this should get you started.

Downloading multiple files

As of 0.6.3, it's possible to download multiple files. The following set of commands downloads all of the 'csv' files in volume 1 using the output from list_volume_session_assets() (when you have a volume ID and a session ID) or list_session_assets() when you have only the session ID. The code below creates a new directory based on the session/slot ID (9807). The function returns the file path to the downloaded files.

vol1_assets |> 
  dplyr::filter(format_extension == "csv") |>
  databraryr::download_session_assets_fr_df()

Any scripts or data that you put into this service are public.

databraryr documentation built on Sept. 11, 2024, 6:48 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

databraryr
Interact with the 'Databrary.org' API

Accessing Data
In databraryr: Interact with the 'Databrary.org' API

Accessing metadata

Downloading multiple files

Try the databraryr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

databraryr Interact with the 'Databrary.org' API

Accessing Data In databraryr: Interact with the 'Databrary.org' API

Accessing metadata

Downloading multiple files

Try the databraryr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

databraryr
Interact with the 'Databrary.org' API

Accessing Data
In databraryr: Interact with the 'Databrary.org' API