attach_src: Data attach utilities

View source: R/setup-attach.R

attach_srcR Documentation

Data attach utilities

Description

Making a dataset available to ricu consists of 3 steps: downloading (download_src()), importing (import_src()) and attaching (attach_src()). While downloading and importing are one-time procedures, attaching of the dataset is repeated every time the package is loaded. Briefly, downloading loads the raw dataset from the internet (most likely in .csv format), importing consists of some preprocessing to make the data available more efficiently and attaching sets up the data for use by the package.

Usage

attach_src(x, ...)

## S3 method for class 'src_cfg'
attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)

## S3 method for class 'character'
attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)

detach_src(x)

setup_src_env(x, ...)

## S3 method for class 'src_cfg'
setup_src_env(x, data_dir = src_data_dir(x), link_env = NULL, ...)

Arguments

x

Data source to attach

...

Forwarded to further calls to attach_src()

assign_env, link_env

Environment in which the data source will become available

data_dir

Directory used to look for fst::fst() files; NULL calls data_dir() using the source name as subdir argument

Details

Attaching a dataset sets up two types of S3 classes: a single src_env object, containing as many src_tbl objects as tables are associated with the dataset. A src_env is an environment with an id_cfg attribute, as well as sub-classes as specified by the data source class_prefix configuration setting (see load_src_cfg()). All src_env objects created by calling attach_src() represent environments that are direct descendants of the data environment and are bound to the respective dataset name within that environment. For more information on src_env and src_tbl objects, refer to new_src_tbl().

If set up correctly, it is not necessary for the user to directly call attach_src(). When the package is loaded, the default data sources (see auto_attach_srcs()) are attached automatically. This default can be controlled by setting as environment variable RICU_SRC_LOAD a comma separated list of data source names before loading the library. Setting this environment variable as

Sys.setenv(RICU_SRC_LOAD = "mimic_demo,eicu_demo")

will change the default of loading both MIMIC-III and eICU, alongside the respective demo datasets, as well as HiRID and AUMC, to just the two demo datasets. For setting an environment variable upon startup of the R session, refer to base::.First.sys().

Attaching a dataset during package namespace loading will both instantiate a corresponding src_env in the data environment and for convenience also assign this object into the package namespace, such that for example the MIMIC-III demo dataset not only is available as ⁠ricu::data::mimic_demo⁠, but also as ricu::mimic_demo (or if the package namespace is attached, simply as mimic_demo). Dataset attaching using attach_src() does not need to happen during namespace loading, but can be triggered by the user at any time. If such a convenience link as described above is desired by the user, an environment such as .GlobalEnv has to be passed as assign_env to attach_src().

Data sets are set up as src_env objects irrespective of whether all (or any) of the required data is available. If some (or all) data is missing, the user is asked for permission to download in interactive sessions and an error is thrown in non-interactive sessions. Downloading demo datasets requires no further information but access to full-scale datasets (even though they are publicly available) is guarded by access credentials (see download_src()).

While attach_src() provides the main entry point, src_env objects are instantiated by the S3 generic function setup_src_env() and the wrapping function serves to catch errors that might be caused by config file parsing issues as to not break attaching of the package namespace. Apart form this, attach_src() also provides the convenience linking into the package namespace (or a user-specified environment) described above.

A src_env object created by setup_src_env() does not directly contain src_tbl objects bound to names, but rather an active binding (see base::makeActiveBinding()) per table. These active bindings check for availability of required files and evaluate to corresponding src_tbl objects if these checks are passed and ask for user input otherwise. As src_tbl objects are intended to be read-only, assignment is not possible except for the value NULL which resets the internally cached src_tbl that is created on first successful access.

Value

Both attach_src() and setup_src_env() are called for side effects and therefore return invisibly. While attach_src() returns NULL, setup_src_env() returns the newly created src_env object.

Examples

## Not run: 

Sys.setenv(RICU_SRC_LOAD = "")
library(ricu)

ls(envir = data)
exists("mimic_demo")

attach_src("mimic_demo", assign_env = .GlobalEnv)

ls(envir = data)
exists("mimic_demo")

mimic_demo


## End(Not run)


ricu documentation built on Sept. 8, 2023, 5:45 p.m.