con_limit_deviations: Detects variable values exceeding limits defined in metadata

View source: R/con_limit_deviations.R

con_limit_deviationsR Documentation

Detects variable values exceeding limits defined in metadata

Description

APPROACH

Inadmissible numerical values can be of type integer or float. This implementation requires the definition of intervals in the metadata to examine the admissibility of numerical study data.

This helps identify inadmissible measurements according to hard limits (for multiple variables).

Usage

con_limit_deviations(
  resp_vars = NULL,
  label_col,
  study_data,
  meta_data,
  limits = c("HARD_LIMITS", "SOFT_LIMITS", "DETECTION_LIMITS")
)

Arguments

resp_vars

variable list the name of the measurement variables

label_col

variable attribute the name of the column in the metadata with labels of variables

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

limits

enum HARD_LIMITS | SOFT_LIMITS | DETECTION_LIMITS. what limits from metadata to check for

Details

ALGORITHM OF THIS IMPLEMENTATION:

  • Remove missing codes from the study data (if defined in the metadata)

  • Interpretation of variable specific intervals as supplied in the metadata.

  • Identification of measurements outside defined limits. Therefore two output data frames are generated:

    • on the level of observation to flag each deviation, and

    • a summary table for each variable.

  • A list of plots is generated for each variable examined for limit deviations. The histogram-like plots indicate respective limits as well as deviations.

  • Values exceeding limits are removed in a data frame of modified study data

For con_detection_limits, The default for the limits argument differs and is here "DETECTION_LIMITS"

Value

a list with:

  • FlaggedStudyData data.frame related to the study data by a 1:1 relationship, i.e. for each observation is checked whether the value is below or above the limits.

  • SummaryTable data.frame summarizes limit deviations for each variable.

  • SummaryPlotList list of ggplots The plots for each variable are either a histogram (continuous) or a barplot (discrete).

  • ModifiedStudyData data.frame If the function identifies limit deviations, the respective values are removed in ModifiedStudyData.

  • ReportSummaryTable: heatmap-like data frame about limit violations

See Also

Examples

load(system.file("extdata", "study_data.RData", package = "dataquieR"))
load(system.file("extdata", "meta_data.RData", package = "dataquieR"))

# make things a bit more complicated for the function, giving datetimes
# as numeric
study_data[,
  vapply(study_data, inherits, "POSIXct", FUN.VALUE = logical(1))] <-
  lapply(study_data[, vapply(study_data, inherits, "POSIXct",
  FUN.VALUE = logical(1))], as.numeric)

MyValueLimits <- con_limit_deviations(
  resp_vars = NULL,
  label_col = "LABEL",
  study_data = study_data,
  meta_data = meta_data,
  limits = "HARD_LIMITS"
)

names(MyValueLimits$SummaryPlotList)

MyValueLimits <- con_limit_deviations(
  resp_vars = c("QUEST_DT_0"),
  label_col = "LABEL",
  study_data = study_data,
  meta_data = meta_data,
  limits = "HARD_LIMITS"
)

MyValueLimits$SummaryPlotList$QUEST_DT_0

dataquieR documentation built on Aug. 31, 2022, 5:08 p.m.