file_utils: File system utilities

data_dirR Documentation

File system utilities

Description

Determine the location where to place data meant to persist between individual sessions.

Usage

data_dir(subdir = NULL, create = TRUE)

src_data_dir(srcs)

auto_attach_srcs()

config_paths()

get_config(name, cfg_dirs = config_paths(), combine_fun = c, ...)

set_config(x, name, dir = file.path("inst", "extdata", "config"), ...)

Arguments

subdir

A string specifying a directory that will be made sure to exist below the data directory.

create

Logical flag indicating whether to create the specified directory

srcs

Character vector of data source names, an object for which an src_name() method is defined or an arbitrary-length list thereof.

name

File name of the configuration file (.json will be appended)

cfg_dirs

Character vector of directories searched for config files

combine_fun

If multiple files are found, a function for combining returned lists

...

Passed to jsonlite::read_json() or jsonlite::write_json()

x

Object to be written

dir

Directory to write the file to (created if non-existent)

Details

For data, the default location depends on the operating system as

Platform Location
Linux ⁠~/.local/share/ricu⁠
macOS ⁠~/Library/Application Support/ricu⁠
Windows ⁠%LOCALAPPDATA%/ricu⁠

If the default storage directory does not exists, it will only be created upon user consent (requiring an interactive session).

The environment variable RICU_DATA_PATH can be used to overwrite the default location. If desired, this variable can be set in an R startup file to make it apply to all R sessions. For example, it could be set within:

  • A project-local .Renviron;

  • The user-level .Renviron;

  • A file at ⁠$(R RHOME)/etc/Renviron.site⁠.

Any directory specified as environment variable will recursively be created.

Data source directories typically are sub-directories to data_dir() named the same as the respective dataset. For demo datasets corresponding to mimic and eicu, file location however deviates from this scheme. The function src_data_dir() is used to determine the expected data location of a given dataset.

Configuration files used both for data source configuration, as well as for dictionary definitions potentially involve multiple files that are read and merged. For that reason, get_config() will iterate over directories passed as cfg_dirs and look for the specified file (with suffix .json appended and might be missing in some of the queried directories). All found files are read by jsonlite::read_json() and the resulting lists are combined by reduction with the binary function passed as combine_fun.

With default arguments, get_config() will simply concatenate lists corresponding to files found in the default config locations as returned by config_paths(): first the directory specified by the environment variable RICU_CONFIG_PATH (if set), followed by the directory at

system.file("extdata", "config", package = "ricu")

Further arguments are passed to jsonlite::read_json(), which is called with slightly modified defaults: simplifyVector = TRUE, simplifyDataFrame = FALSE and simplifyMatrix = FALSE.

The utility function set_config() writes the list passed as x to file dir/name.json, using jsonlite::write_json() also with slightly modified defaults (which can be overridden by passing arguments as ...): null = "null", auto_unbox = TRUE and pretty = TRUE.

Whenever the package namespace is attached, a summary of dataset availability is printed using the utility functions auto_attach_srcs() and src_data_avail(). While the former simply returns a character vector of data sources that are configures for automatically being set up on package loading, the latter returns a summary of the number of available tables per dataset.m Finally, is_data_avail() returns a named logical vector indicating which data sources have all required data available.

Value

Functions data_dir(), src_data_dir() and config_paths() return file paths as character vectors, auto_attach_srcs() returns a character vector of data source names, src_data_avail() returns a data.frame describing availability of data sources and is_data_avail() a named logical vector. Configuration utilities get_config() and set_config() read and write list objects to/from JSON format.

Examples

Sys.setenv(RICU_DATA_PATH = tempdir())
identical(data_dir(), tempdir())

dir.exists(file.path(tempdir(), "some_subdir"))
some_subdir <- data_dir("some_subdir")
dir.exists(some_subdir)

cfg <- get_config("concept-dict")

identical(
  cfg,
  get_config("concept-dict",
             system.file("extdata", "config", package = "ricu"))
)


ricu documentation built on Sept. 8, 2023, 5:45 p.m.