con_contradictions: Checks user-defined contradictions in study data

View source: R/con_contradictions.R

con_contradictionsR Documentation

Checks user-defined contradictions in study data


This approach considers a contradiction if impossible combinations of data are observed in one participant. For example, if age of a participant is recorded repeatedly the value of age is (unfortunately) not able to decline. Most cases of contradictions rest on comparison of two variables.

Important to note, each value that is used for comparison may represent a possible characteristic but the combination of these two values is considered to be impossible. The approach does not consider implausible or inadmissible values.


  resp_vars = NULL,
  summarize_categories = FALSE



variable list the name of the measurement variables


data.frame the data frame that contains the measurements


data.frame the data frame that contains metadata attributes of study data


variable attribute the name of the column in the metadata with labels of variables


numeric from=0 to=100. a numerical value ranging from 0-100


data.frame contradiction rules table. Table defining contractions. See details for its required structure.


logical Needs a column 'tag' in the check_table. If set, a summary output is generated for the defined categories plus one plot per category. inheritParams acc_distributions


Algorithm of this implementation:

  • Select all variables in the data with defined contradiction rules (static metadata column CONTRADICTIONS)

  • Remove missing codes from the study data (if defined in the metadata)

  • Remove measurements deviating from limits defined in the metadata

  • Assign label to levels of categorical variables (if applicable)

  • Apply contradiction checks on predefined sets of variables

  • Identification of measurements fulfilling contradiction rules. Therefore two output data frames are generated:

    • on the level of observation to flag each contradictory value combination, and

    • a summary table for each contradiction check.

  • A summary plot illustrating the number of contradictions is generated.

List function.


If summarize_categories is FALSE: A list with:

  • FlaggedStudyData: The first output of the contradiction function is a data frame of similar dimension regarding the number of observations in the study data. In addition, for each applied check on the variables an additional column is added which flags observations with a contradiction given the applied check.

  • SummaryTable: The second output summarizes this information into one data frame. This output can be used to provide an executive overview on the amount of contradictions. This output is meant for automatic digestion within pipelines.

  • SummaryData: The third output is the same as SummaryTable but for human readers.

  • SummaryPlot: The fourth output visualizes summarized information of SummaryData.

if summarize_categories is TRUE, other objects are returned: one per category named by that category (e.g. "Empirical") containing a result for contradictions within that category only. Additionally, in the slot all_checks a result as it would have been returned with summarize_categories set to FALSE. Finally, a slot SummaryData is returned containing sums per Category and an according ggplot in SummaryPlot.

See Also

Online Documentation


## Not run: 
load(system.file("extdata", "meta_data.RData", package = "dataquieR"))
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
check_table <- read.csv(system.file("extdata",
  package = "dataquieR"
header = TRUE, sep = "#"
check_table[1, "tag"] <- "Logical"
check_table[1, "Label"] <- "Becomes younger"
check_table[2, "tag"] <- "Empirical"
check_table[2, "Label"] <- "sex transformation"
check_table[3, "tag"] <- "Empirical"
check_table[3, "Label"] <- "looses academic degree"
check_table[4, "tag"] <- "Logical"
check_table[4, "Label"] <- "vegetarian eats meat"
check_table[5, "tag"] <- "Logical"
check_table[5, "Label"] <- "vegan eats meat"
check_table[6, "tag"] <- "Empirical"
check_table[6, "Label"] <- "non-veg* eats meat"
check_table[7, "tag"] <- "Empirical"
check_table[7, "Label"] <- "Non-smoker buys cigarettes"
check_table[8, "tag"] <- "Empirical"
check_table[8, "Label"] <- "Smoker always scrounges"
check_table[9, "tag"] <- "Logical"
check_table[9, "Label"] <- "Cuff didn't fit arm"
check_table[10, "tag"] <- "Empirical"
check_table[10, "Label"] <- "Very mature pregnant woman"
label_col <- "LABEL"
threshold_value <- 1
  study_data = study_data, meta_data = meta_data, label_col = label_col,
  threshold_value = threshold_value, check_table = check_table
check_table[1, "tag"] <- "Logical, Age-Related"
check_table[10, "tag"] <- "Empirical, Age-Related"
  study_data = study_data, meta_data = meta_data, label_col = label_col,
  threshold_value = threshold_value, check_table = check_table
  study_data = study_data, meta_data = meta_data, label_col = label_col,
  threshold_value = threshold_value, check_table = check_table,
  summarize_categories = TRUE

## End(Not run)

dataquieR documentation built on July 26, 2023, 6:10 p.m.