pipeline_vectorized: Call (nearly) one "Accuracy" function with many...

View source: R/pipeline_vectorized.R

pipeline_vectorizedR Documentation

Call (nearly) one "Accuracy" function with many parameterizations at once automatically

Description

This is a function to automatically call indicator functions of the "Accuracy" dimension in a vectorized manner with a set of parameterizations derived from the metadata.

Usage

pipeline_vectorized(
  fct,
  resp_vars = NULL,
  study_data,
  meta_data,
  label_col,
  ...,
  key_var_names,
  cores = list(mode = "socket", logging = FALSE, load.balancing = TRUE),
  variable_roles = list(resp_vars = list(VARIABLE_ROLES$PRIMARY,
    VARIABLE_ROLES$SECONDARY), group_vars = VARIABLE_ROLES$PROCESS),
  result_groups,
  use_cache = FALSE,
  compute_plan_only = FALSE
)

Arguments

fct

function function to call

resp_vars

variable list the name of the measurement variables, if NULL (default), all variables are used.

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

...

additional arguments for the function

key_var_names

character character vector named by arguments to be filled by metadata GROUP_VAR-entries as follows: c(group_vars = GROUP_VAR_OBSERVER) – may be missing, then all possible combinations will be analyzed. Cannot contain resp_vars.

cores

integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller.

variable_roles

list restrict each function argument (referred to by its name matching a name in names(variable_roles)) to variables of the role given here.

result_groups

character columns to group results to encapsulated lists or NULL receive a data frame with all call arguments and their respective results in a column 'result' – see pipeline_recursive_result

use_cache

logical set to FALSE to omit re-using already distributed study- and metadata on a parallel cluster

compute_plan_only

logical set to TRUE to omit computations and return only the compute plan filled with planned evaluations. used in pipelines.

Details

The function to call is given as first argument. All arguments of the called functions can be given here, but pipline_vectorized can derive technically possible values (most of them) from the metadata, which can be controlled using the arguments key_var_names and variable_roles. The function returns an encapsulated list by default, but it can also return a data.frame. See also pipeline_recursive_result for these two options. The argument use_cache controls, whether the input data (study_data and meta_data) should be passed around, if running in parallel or being distributed beforehand to the compute nodes. All calls will be done in parallel, if possible. This can be configured, see argument cores below.

If the function is called in a larger framework (such as dq_report), then compute_plan_only controls, not to actually call functions but return a data.frame with parameterizations of "Accuracy" functions only. Also in such a scenario, one may want not to start and stop a cluster with entry and leaving of pipeline_vectorized but use an existing cluster. This can be achieved by setting the cores argument NULL.

Value

  • if result_groups is set, a list with:

    • first argument's values in result_groups, each containing second's argument's values as a similar list recursively;

  • if result_groups is not set, a data frame with one row per function call, all the arguments of each call in its columns and a column results providing the function calls' results.

Examples

## Not run:  # really long-running example
load(system.file("extdata/study_data.RData", package = "dataquieR"))
load(system.file("extdata/meta_data.RData", package = "dataquieR"))
a <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER)
)
b <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL
)
b_adj <-
  pipeline_vectorized(
    fct = acc_margins, study_data = study_data,
    meta_data = meta_data, label_col = LABEL, co_vars = c("SEX_1", "AGE_1")
  )
c <- pipeline_vectorized(
  fct = acc_loess, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  variable_roles = list(
    resp_vars = list(VARIABLE_ROLES$PRIMARY),
    group_vars = VARIABLE_ROLES$PROCESS
  )
)
d <- pipeline_vectorized(
  fct = acc_loess, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  variable_roles = list(
    resp_vars = list(VARIABLE_ROLES$PRIMARY, VARIABLE_ROLES$SECONDARY),
    group_vars = VARIABLE_ROLES$PROCESS
  )
)
e <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0"
)

f <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0",
  result_groups = NULL
)
pipeline_recursive_result(f)
g <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0",
  result_groups = c("co_vars")
)
g1 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0",
  result_groups = c("group_vars")
)
g2 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0",
  result_groups = c("group_vars", "co_vars")
)
g3 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  key_var_names = c(group_vars = GROUP_VAR_OBSERVER), co_vars = "SEX_0",
  result_groups = c("co_vars", "group_vars")
)
g4 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_data, label_col = LABEL,
  co_vars = "SEX_0", result_groups = c("co_vars")
)
meta_datax <- meta_data
meta_datax[9, "GROUP_VAR_DEVICE"] <- "v00011"
g5 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_datax, label_col = LABEL,
  co_vars = "SEX_0", result_groups = c("co_vars")
)
g6 <- pipeline_vectorized(
  fct = acc_margins, study_data = study_data,
  meta_data = meta_datax, label_col = LABEL,
  co_vars = "SEX_0", result_groups = c("co_vars", "group_vars")
)

## End(Not run)

dataquieR documentation built on July 26, 2023, 6:10 p.m.