list_fetch_files: List and download files

list_filesR Documentation

List and download files

Description

A dataset in openBIS represents a collection of files. The function list_files() lists files associated with one or more datasets by returning a set of FileInfoDssDTO objects. As this object type does not contain information on data set association, the data set code is saved as data_set attribute with each FileInfoDssDTO object. Data set files can be fetched using fetch_files(), which can either retrieve all associated files or use file path information, for example from FileInfoDssDTO objects to only download a subset of files.

Usage

list_files(token, x, ...)

## S3 method for class 'character'
list_files(token, x, path = "", recursive = TRUE,
  ...)

## S3 method for class 'DataSet'
list_files(token, x, path = "", recursive = TRUE,
  ...)

## S3 method for class 'DatasetIdentifier'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'DatasetReference'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'FeatureVectorDatasetReference'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'FeatureVectorDatasetWellReference'
list_files(token, x,
  path = "", recursive = TRUE, ...)

## S3 method for class 'ImageDatasetReference'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'MicroscopyImageReference'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'PlateImageReference'
list_files(token, x, path = "",
  recursive = TRUE, ...)

## S3 method for class 'DataSetFileDTO'
list_files(token, x, ...)

fetch_files(token, x, ...)

## S3 method for class 'character'
fetch_files(token, x, files = NULL, n_con = 5L,
  reader = identity, ...)

## S3 method for class 'NULL'
fetch_files(token, x, files, n_con = 5L,
  reader = identity, ...)

## S3 method for class 'DataSet'
fetch_files(token, x, ...)

## S3 method for class 'DatasetIdentifier'
fetch_files(token, x, ...)

## S3 method for class 'DatasetReference'
fetch_files(token, x, ...)

## S3 method for class 'FeatureVectorDatasetReference'
fetch_files(token, x, ...)

## S3 method for class 'FeatureVectorDatasetWellReference'
fetch_files(token, x, ...)

## S3 method for class 'ImageDatasetReference'
fetch_files(token, x, ...)

## S3 method for class 'MicroscopyImageReference'
fetch_files(token, x, ...)

## S3 method for class 'PlateImageReference'
fetch_files(token, x, ...)

## S3 method for class 'DataSetFileDTO'
fetch_files(token, x, ...)

## S3 method for class 'FileInfoDssDTO'
fetch_files(token, x, data_sets = NULL, ...)

read_mat_files(data)

Arguments

token

Login token as created by login_openbis().

x

Object to limit search for datasets/files with.

...

Generic compatibility. Extra arguments will be passed to make_requests() or do_requests_serial()/do_requests_parallel().

path

A (vector of) file path(s) to be searched within a dataset.

recursive

A (vector of) logicals, indicating whether to list files recursively.

files

Optional set of FileInfoDssDTO objects. If NULL, all files corresponding to the specified datasets are assumed. This file list can be filtered, by passing a regular expression as file_regex argument via ....

n_con

The number of simultaneous connections.

reader

A function to read the downloaded data. Is forwarded as finally argument to do_requests_serial()/do_requests_parallel().

data_sets

Either a single dataset object (anything that has a dataset_code() method) or a set of objects of the same length as x. If NULL (default), each FileInfoDssDTO object passed as x is expected to contain a data_set attribute.

data

The data to be read.

Details

Data sets for list_files() can be specified as character vector of dataset codes and therefore all objects for which the internal method dataset_code() exists can be used to select datasets. This includes data set and data set id objects as well as the various flavors of data set reference objects. In addition to these dataset-representing objects, dispatch on DataSetFileDTO objects is possible as well.

File listing can be limited to a certain path within the dataset and the search can be carried out recursively or non-recursively. In case a set of objects is passed, the search-tuning arguments path and recursive have to be either of length 1 or of the same length as x. If dispatch occurs on DataSetFileDTO objects, the path and recursive arguments are not needed, as this information is already encoded in the objects passed as x. A separate API call is necessary for each of the objects the dispatch occurs on.

The function fetch_files() downloads files associated with a dataset. In order to identify a file, both a data set code and a file path, relative to the data set root, are required. fetch_files() can be called in a variety of ways and internally uses a double dispatch mechanism, first resolving the data set codes and then calling the non-exported function fetch_ds_files() which dispatches on file path objects.

Data set code information can either be communicated using any of the objects understood by dataset_code() (including data set, data set id and data set reference objects) or directly as a character vector, passed as x argument. In case data set code information is omitted (passed to x as NULL), the objects encoding file paths have to specify the corresponding data sets. Furthermore, DataSetFileDTO objects may be passed as x argument to fetch_files(), which will internally call fetch_files() again, setting the argument x to NULL and pass the DataSetFileDTO objects as files argument. Finally, if FileInfoDssDTO are passed to fetch_files() as x argument, an optional argument data_sets may be specified (it defaults to NULL) and as above, fetch_files() is called again with these two arguments rearranged.

The internal generic function fetch_ds_files() can be dispatched on several objects again. When no files are specified (NULL is passed as files argument to fetch_files()), all available files for the given data sets are queried. This list can be filtered using the file_regex() argument which can be a single regular expression and is applied to file paths. File paths can be specified as character vector, FileInfoDssDTO or DataSetFileDTO objects. If dispatch occurs on FileInfoDssDTO, and no data set code information is available (NULL passed as x or data_sets argument to fetch_files()) each FileInfoDssDTO must contain a data_set attribute. Additionally, downloaded files are checked for completeness, as these objects contain file sizes. If dispatch occurs on DataSetFileDTO objects or a character vector, this sanity check is not possible.

Files can only be retrieved after previously having created a corresponding download url using list_download_urls(), as file urls in openBIS have a limited lifetime and therefore must be used shortly after being created. A list of call objects (see base::call()) is created and passed to either do_requests_serial() or do_requests_parallel(). Whether file fetching is carried out in serial or parallel is controlled by the n_con argument. In case a download fails, it is retried again up to the number of times specified as n_try. Finally, a function with a single argument can be passed as the argument done, which takes the downloaded data as input and does some processing.

A function for reading the binary data retrieved from openBIS can be supplied to fetch_files() as reader argument. Single cell feature files as produced by CellProfiler, are stored as Matlab v5.0 .mat files and the function read_mat_files() reads such files using R.matlab::readMat() and checks for certain expected attributes and simplifies the read structure.

The list returned by read_mat_files() is arranged such that each node corresponds to a single image and contains a list which is either holding a single value or a vector of values. For a plate with 16 rows, 24 columns and 3 x 3 imaging sites this will yield a list of length 3456. Index linearization is in row-major fashion for both wells and sites. Furthermore, imaging sites come first such that in this example, the first three list entries correspond to image row 1 (left to right) of well A1, the next three entries correspond to row 2 of well A1, images 10 through 12 correspond to row 1 of well A2, etc. Well A2 is located in row 1, column 2 of a plate.

Value

list_files() either returns a json_class or a json_vec object of subtype FileInfoDssDTO, depending on whether a single or a set of objects is retrieved. For fetch_files(), the return type depends on the callback function passed as reader argument. At default, a list is returned with an entry per file, holding a raw vector of the file data.

openBIS

  • \Sexpr[results=rd]{infx::docs_link("dsrg", "listFilesForDataSet")}

See Also

Other resource listing/downloading functions: fetch_images, list_download_urls, list_features

Examples


  tok <- login_openbis()

  # search for a cell profiler feature data set from plate KB2-03-1I
  search <- search_criteria(
    attribute_clause("type", "HCS_ANALYSIS_CELL_FEATURES_CC_MAT"),
    sub_criteria = search_sub_criteria(
      search_criteria(attribute_clause("code",
                                       "/INFECTX_PUBLISHED/KB2-03-1I")),
      type = "sample"
    )
  )
  ds <- search_openbis(tok, search)

  # list all files of this data set
  all_files <- list_files(tok, ds)
  length(all_files)

  # select some of the files, e.g. all count features per image
  some_files <- all_files[grepl("Image\\.Count_",
                                get_field(all_files, "pathInDataSet"))]
  length(some_files)

  # download the selected files
  data <- fetch_files(tok, some_files)

  # the same can be achieved by passing a file_regex argument to
  # fetch_files(), which internally calls list_files() and filters files
  identical(data, fetch_files(tok, ds, file_regex = "Image\\.Count_"))

  # all returned data is raw, the reader argument can be used to supply
  # a function that processes the downloaded data
  sapply(data, class)
  data <- fetch_files(tok, some_files, reader = read_mat_files)
  sapply(data, class)

  logout_openbis(tok)



ropensci/infx documentation built on May 14, 2022, 5:51 p.m.