dq_report: Generate a full DQ report

View source: R/dq_report.R

dq_reportR Documentation

Generate a full DQ report

Description

Generate a full DQ report

Usage

dq_report(
  study_data,
  meta_data,
  label_col = NULL,
  meta_data_segment = "segment_level",
  meta_data_dataframe = "dataframe_level",
  ...,
  dimensions = c("Completeness", "Consistency"),
  dont_modify_study_data_by = c("con_soft_limits", "con_detection_limits"),
  strata_attribute,
  strata_vars,
  cores = list(mode = "socket", logging = FALSE, cpus = util_detect_cores(),
    load.balancing = TRUE),
  specific_args = list(),
  author = prep_get_user_name(),
  debug_parallel = FALSE
)

Arguments

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

meta_data_segment

data.frame – optional: Segment level metadata

meta_data_dataframe

data.frame – optional: Data frame level metadata

...

arguments to be passed to all called indicator functions if applicable.

dimensions

dimensions Vector of dimensions to address in the report. Allowed values in the vector are Completeness, Consistency, and Accuracy. The generated report will only cover the listed data quality dimensions. Accuracy is computational expensive, so this dimension is not enabled by default. Completeness should be included, if Consistency is included, and Consistency should be included, if Accuracy is included to avoid misleading detections of e.g. missing codes as outliers, please refer to the data quality concept for more details. Integrity is always included.

dont_modify_study_data_by

character list of functions, which are not allowed to modify study data downstream the pipeline, e.g., to avoid, that even soft limit violations are removed.

strata_attribute

character variable of a variable attribute coding study segments. Values other than leaving this empty or passing STUDY_SEGMENT are not yet supported. Stratification is not yet fully supported, please use dq_report_by.

strata_vars

character name of variables to stratify the report on, such as "study_center". Not yet supported, please use dq_report_by.

cores

integer number of cpu cores to use or a named list with arguments for parallelMap::parallelStart or NULL, if parallel has already been started by the caller.

specific_args

list named list of arguments specifically for one of the called functions, the of the list elements correspond to the indicator functions whose calls should be modified. The elements are lists of arguments.

author

character author for the report documents.

debug_parallel

logical print blocks currently evaluated in parallel

Details

See dq_report_by for a way to generate stratified or splitted reports easily.

Value

a dataquieR_resultset. Can be printed creating a RMarkdown-report.

See Also

  • as.data.frame.dataquieR_resultset, * as.list.dataquieR_resultset, * print.dataquieR_resultset, * summary.dataquieR_resultset

  • dq_report_by

Examples

## Not run:  # really long-running example.
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
load(system.file("extdata", "meta_data.RData", package = "dataquieR"))
report <- dq_report(study_data, meta_data, label_col = LABEL) # most easy use
report <- dq_report(study_data, meta_data,
  label_col = LABEL, dimensions =
    c("Completeness", "Consistency", "Accuracy"),
  check_table = read.csv(system.file("extdata",
    "contradiction_checks.csv",
    package = "dataquieR"
  ), header = TRUE, sep = "#"),
  show_causes = TRUE,
  cause_label_df = prep_get_data_frame("meta_data_v2|missing_table")
)
save(report, file = "report.RData") # careful, this contains the study_data
report <- dq_report(study_data, meta_data,
  label_col = LABEL,
  check_table = read.csv(system.file("extdata",
    "contradiction_checks.csv",
    package = "dataquieR"
  ), header = TRUE, sep = "#"),
  specific_args = list(acc_univariate_outlier = list(resp_vars = "K")),
    resp_vars = "SBP_0"
)
report <- dq_report(study_data, meta_data,
  label_col = LABEL,
  check_table = read.csv(system.file("extdata",
    "contradiction_checks.csv",
    package = "dataquieR"
  ), header = TRUE, sep = "#"),
  specific_args = list(acc_univariate_outlier = list(resp_vars = "DBP_0")),
    resp_vars = "SBP_0"
)
report <- dq_report(study_data, meta_data,
  label_col = LABEL,
  check_table = read.csv(system.file("extdata",
    "contradiction_checks.csv",
    package = "dataquieR"
  ), header = TRUE, sep = "#"),
  specific_args = list(acc_univariate_outlier = list(resp_vars = "DBP_0")),
  resp_vars = "SBP_0", cores = NULL
)
rp1 <- dq_report("ship", "ship_meta",
  meta_data_segment = "meta_data_segment",
  meta_data_dataframe = "meta_data_dataframe",
  label_col = LABEL)

## End(Not run)

dataquieR documentation built on July 26, 2023, 6:10 p.m.