R/db.R
In naryn: Native Access Medical Record Retriever for High Yield Analytics

Documented in emr_db.connect emr_db.init emr_db.init_examples emr_db.ls emr_db.reload emr_db.subset emr_db.subset.ids emr_db.subset.info emr_db.unload

#' Initializes connection with Naryn Database
#'
#' Initializes connection with Naryn Database
#'
#' Call `emr_db.connect` function to establish the access to the tracks in the db_dirs.
#' To establish a connection using `emr_db.connect`, Naryn requires to specify at-least
#' one db dir. Optionally, `emr_db.connect` accepts additional db dirs which can also
#' contain additional tracks.
#'
#' In a case where 2 or more db dirs contain the same track name (namespace collision),
#' the  track will  be taken from the db dir which was passed *last* in  the order of
#' connections.
#'
#' For example, if we have 2 db dirs \code{/db1} and \code{/db2} which both contain
#' a track named \code{track1}, the call  \code{emr_db.connect(c('/db1', '/db2'))} will result with
#' Naryn  using \code{track1} from \code{/db2}. As you might expect the overriding is consistent not
#' only for the track's data, but also for any other Naryn entity using or pointing
#' to the track.
#'
#' Even though all the db dirs may contain track files, their designation is different.
#' All the db dirs except the last dir in the order of connections are mainly read-only.
#' The directory which was connected last in the order, also known as *user dir*, is
#' intended to store volatile data like the results of intermediate calculations.
#'
#' New tracks can be created only in  the db dir which was last in  the order of
#' connections, using \code{emr_track.import} or \code{emr_track.create}. In order to write tracks
#' to a db dir which is not last in the connection order, the user must explicitly
#' reconnect and set the required db dir as the last in order, this should be done for a
#' well justified reason.
#'
#' When the package is attached it internally calls 'emr_db.init_examples'
#' which sets a single example db dir - 'PKGDIR/naryndb/test'.
#' ('PKGDIR' is the directory where the package is installed).
#'
#' Physical files in the database are supposed to be managed exclusively by
#' Naryn itself. Manual modification, addition or deletion of track files may
#' be done, yet it must be ratified via running 'emr_db.reload'. Some of these
#' manual changes however (like moving a track from global space to user or
#' vice versa) might cause 'emr_db.connect' to fail. 'emr_db.reload' cannot be
#' invoked then as it requires first the connection to the DB be established.
#' To break the deadlock use 'do_reload=True' parameter within 'emr_db.connect'.
#' This will connect to the DB and rebuild the DB index files in one step.
#'
#' If 'load_on_demand' is 'TRUE' a track is loaded into memory only when it is
#' accessed and it is unloaded from memory as R sessions ends or the package is
#' unloaded.
#'
#' If 'load_on_demand' parameter is 'FALSE', all the tracks from the specified
#' space (global / user) are pre-loaded into memory making subsequent track
#' access significantly faster. As loaded tracks reside in shared memory, other
#' R sessions running on the same machine, may also enjoy significant run-time
#' boost. On the flip side, pre-loading all the tracks prolongs the execution
#' of 'emr_db.connect' and requires enough memory to accommodate all the data.
#'
#' Choosing between the two modes depends on the specific needs. While
#' 'load_on_demand=TRUE' seems to be a solid default choice, in an environment
#' where there are frequent short-living R sessions, each accessing a track one
#' might opt for running a "daemon" - an additional permanent R session. The
#' daemon would pre-load all the tracks in advance and stay alive thus boosting
#' the run-time of the later emerging sessions.
#'
#' Upon completion the connection is established with the database and a few
#' variables are added to the .naryn environment. These variables should not be
#' modified by the user!
#'
#' \tabular{lll}{
#' .naryn$EMR_GROOT \tab First db dir of tracks in the order of connections \cr
#' .naryn$EMR_UROOT \tab Last db dir of tracks in the order of connection (user dir) \cr
#' .naryn$EMR_ROOTS \tab Vector of directories (db_dirs) \cr
#' }
#'
#' \code{emr_db.init} is the old version of this function which
#' is now deprecated.
#'
#' \code{emr_db.ls} lists all the currently connected databases.
#'
#'
#'
#' @aliases emr_db.connect emr_db.init_examples
#' @param db_dirs vector of db directories
#' @param load_on_demand vector of booleans, same length as db_dirs, if load_on_demand[i] is FALSE, tracks from db_dirs[i] will be pre-loaded, or a single 'TRUE' or 'FALSE' to set \code{load_on_demand} for all the databases. If NULL is passed, \code{load_on_demand} is set to TRUE on all the databases
#' @param do_reload If \code{TRUE}, rebuilds DB index files.
#' @param global.dir,user.dir,global.load.on.demand,user.load.on.demand,do.reload old parameters of the deprecated function \code{emr_db.init}
#' @return None.
#' @seealso \code{\link{emr_db.reload}}, \code{\link{emr_track.import}},
#' \code{\link{emr_track.create}}, \code{\link{emr_track.rm}},
#' \code{\link{emr_track.ls}}, \code{\link{emr_vtrack.ls}},
#' \code{\link{emr_filter.ls}}
#' @keywords ~db ~data ~database
#' @export emr_db.connect
emr_db.connect <- function(db_dirs = NULL, load_on_demand = NULL, do_reload = FALSE) {
    if (is.null(db_dirs)) {
        stop("Usage: emr_db.connect(db_dirs, load_on_demand = NULL, do_reload = FALSE)", call. = FALSE)
    }

    db_dirs <- normalizePath(db_dirs) # get absolute path

    if (any(duplicated(db_dirs))) {
        stop("DB directories should differ from one another", call. = FALSE)
    }

    if (!is.null(load_on_demand)) {
        if (length(load_on_demand) == 1) {
            load_on_demand <- rep(load_on_demand, length(db_dirs))
        }

        if (length(db_dirs) != length(load_on_demand)) {
            stop("load_on_demand must be in the same length of db_dirs", call. = FALSE)
        }

        if (!all(is.logical(load_on_demand))) {
            stop("load_on_demand shuold be a logical vector in the same length of db_dirs (note that 'db_dirs' is a vector)", call. = FALSE)
        }
    }

    # We set the groot to be the first
    # directory in the vector
    assign("EMR_GROOT", db_dirs[1], envir = .naryn)

    # We set the uroot to be the last
    if (length(db_dirs) > 1) {
        assign("EMR_UROOT", utils::tail(db_dirs, n = 1), envir = .naryn)
    }

    assign("EMR_ROOTS", db_dirs, envir = .naryn)

    if (is.null(load_on_demand)) {
        load_on_demand <- !logical(length(db_dirs))
    }

    success <- FALSE

    tryCatch(
        {
            .emr_call("emr_dbinit", db_dirs, load_on_demand, do_reload, .emr_env(), silent = TRUE)
            success <- TRUE
        },
        finally = {
            if (!success) {
                remove("EMR_GROOT", envir = .naryn)
                remove("EMR_UROOT", envir = .naryn)
                remove("EMR_ROOTS", envir = .naryn)
            }
        }
    )

    emr_entries.reload(db_dirs)
}

#' @export emr_db.init
#' @rdname emr_db.connect
emr_db.init <- function(global.dir = NULL, user.dir = NULL, global.load.on.demand = TRUE, user.load.on.demand = TRUE, do.reload = FALSE) {
    lifecycle::deprecate_soft(
        when = "2.6.2",
        what = "emr_db.init()",
        with = "emr_db.connect()",
    )

    db_dirs <- c(global.dir, user.dir)

    if (is.null(user.dir)) {
        load_on_demand <- c(global.load.on.demand)
    } else {
        load_on_demand <- c(global.load.on.demand, user.load.on.demand)
    }

    emr_db.connect(db_dirs = db_dirs, load_on_demand = load_on_demand, do_reload = do.reload)
}

#' @export
#' @rdname emr_db.connect
emr_db.ls <- function() {
    .naryn$EMR_ROOTS
}

#' Initialize the examples database
#'
#' @description This function initializes the examples database. When \code{n_dbs} is more than 1, multiple
#' databases are created.
#'
#' @param n_dbs number of databases to create
#'
#' @return None
#'
#' @examples
#' emr_db.init_examples()
#'
#' @export
#' @noRd
emr_db.init_examples <- function(n_dbs = 1) {
    db_dir <- tempdir()
    utils::untar(system.file("testdb.tar.gz", package = "naryn"), exdir = db_dir)
    db_dirs <- file.path(db_dir, "naryndb/test")

    if (n_dbs > 1) {
        for (i in 2:n_dbs) {
            db_dir <- file.path(tempdir(), paste0("naryndb", i))
            dir.create(db_dir, recursive = TRUE, showWarnings = FALSE)
            utils::untar(system.file("testdb.tar.gz", package = "naryn"), exdir = db_dir)
            unlink(file.path(db_dir, "naryndb/test/patients.dob.nrtrack"))
            emr_db.connect(file.path(db_dir, "naryndb/test"))
            emr_db.reload()
            db_dirs <- c(db_dirs, file.path(db_dir, "naryndb/test"))
        }
    }

    emr_db.connect(db_dirs)
}



#' Reloads database
#'
#' Reloads database
#'
#' Rebuilds Naryn database index files. Use this function if you manually
#' add/delete/move/modify track files or if you suspect that the database is
#' corrupted: existing tracks cannot be found, deleted ones continue to appear
#' or a warning message is issued by Naryn itself recommending to run
#' 'emr_db.reload'.
#'
#' @return None.
#'
#' @examples
#' emr_db.reload()
#'
#' @seealso \code{\link{emr_db.connect}}, \code{\link{emr_track.ls}},
#' \code{\link{emr_vtrack.ls}}
#' @keywords ~db
#' @export emr_db.reload
emr_db.reload <- function() {
    success <- FALSE
    tryCatch(
        {
            .emr_call("emr_dbreload", silent = TRUE)
            success <- TRUE
        },
        finally = {
            if (!success) {
                remove("EMR_GROOT", envir = .naryn)
                remove("EMR_UROOT", envir = .naryn)
            }
        }
    )

    purrr::walk(emr_db.ls(), emr_entries.reload)
}

#' Unload all tracks from naryn database
#'
#' @return None.
#'
#' @examples
#' \donttest{
#' emr_db.unload()
#' }
#'
#' @export
emr_db.unload <- function() {
    .emr_call("emr_dbunload", .emr_env(), silent = TRUE)
}

#' Defines an ids subset
#'
#' Defines an ids subset.
#'
#' 'emr_db.subset' creates an ids subset" ("viewport") of data of "fraction *
#' sizeof('src')" size by sampling the ids from 'src'. Once the subset is
#' defined only the ids that are in the subset are used by various functions
#' and iterators. Other ids are ignored.
#'
#' 'src' can be a track name or an ids table. If 'complementary' is 'TRUE' the
#' complementary set of sampled ids is used as a subset.
#'
#' If 'src' is 'NULL' the current subset is annihilated.
#'
#' @param src track name or ids table or 'NULL'
#' @param fraction fraction of data to be sampled from 'src' in [0,1] range
#' @param complementary 'TRUE' for a complementary subset, otherwise 'FALSE'
#' @return None.
#' @seealso \code{\link{emr_db.connect}}, \code{\link{emr_db.subset.ids}},
#' \code{\link{emr_db.subset.info}}
#' @keywords ~db ~data ~database ~subset
#' @export emr_db.subset
emr_db.subset <- function(src = "", fraction = NULL, complementary = NULL) {
    if (!is.null(src) && is.atomic(src) && src == "") {
        stop("Usage: emr_db.subset(src, fraction, complementary)", call. = FALSE)
    }
    .emr_checkroot()

    .emr_call("emr_db_subset", src, fraction, complementary, .emr_env())
}



#' Returns the ids that constitute the current ids subset
#'
#' Returns the ids that constitute the current ids subset.
#'
#' 'emr_db.subset.ids' returns the ids that constitute the current ids subset.
#' The ids are returned in "ids table" format.
#'
#' If no ids subset is defined, 'emr_db.subset.ids' returns 'NULL'.
#'
#' @return Ids table or 'NULL'
#' @seealso \code{\link{emr_db.subset}}
#' @keywords ~db ~data ~database ~subset
#' @export emr_db.subset.ids
emr_db.subset.ids <- function() {
    .emr_checkroot()
    .emr_call("emr_db_subset_ids", .emr_env())
}



#' Returns information about the current subset
#'
#' Returns information about the current subset.
#'
#' 'emr_db.subset.info' returns the parameters that were used to define the
#' current subset or 'NULL' if no subset has been defined.
#'
#' @return Information about the current subset or 'NULL'.
#' @seealso \code{\link{emr_db.subset}}, \code{\link{emr_db.subset.ids}}
#' @keywords ~db ~data ~database ~subset
#' @export emr_db.subset.info
emr_db.subset.info <- function() {
    .emr_checkroot()
    .emr_call("emr_db_subset_info", .emr_env())
}

Any scripts or data that you put into this service are public.

naryn documentation built on Sept. 27, 2024, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

naryn
Native Access Medical Record Retriever for High Yield Analytics

R/db.R
In naryn: Native Access Medical Record Retriever for High Yield Analytics

Defines functions emr_db.subset.info emr_db.subset.ids emr_db.subset emr_db.unload emr_db.reload emr_db.init_examples emr_db.ls emr_db.init emr_db.connect

Documented in emr_db.connect emr_db.init emr_db.init_examples emr_db.ls emr_db.reload emr_db.subset emr_db.subset.ids emr_db.subset.info emr_db.unload

Try the naryn package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

naryn Native Access Medical Record Retriever for High Yield Analytics

R/db.R In naryn: Native Access Medical Record Retriever for High Yield Analytics

Defines functions emr_db.subset.info emr_db.subset.ids emr_db.subset emr_db.unload emr_db.reload emr_db.init_examples emr_db.ls emr_db.init emr_db.connect

Documented in emr_db.connect emr_db.init emr_db.init_examples emr_db.ls emr_db.reload emr_db.subset emr_db.subset.ids emr_db.subset.info emr_db.unload

Try the naryn package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

naryn
Native Access Medical Record Retriever for High Yield Analytics

R/db.R
In naryn: Native Access Medical Record Retriever for High Yield Analytics