| con_inadmissible_vocabulary | R Documentation |
For each categorical variable, value lists should be defined in the metadata. This implementation will examine, if all observed levels in the study data are valid.
Indicator
con_inadmissible_vocabulary(
resp_vars = NULL,
study_data,
label_col,
item_level = "item_level",
threshold_value = 0,
meta_data = item_level,
meta_data_v2
)
resp_vars |
variable list the name of the measurement variables |
study_data |
data.frame the data frame that contains the measurements |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
item_level |
data.frame the data frame that contains metadata attributes of study data |
threshold_value |
numeric from=0 to=100. a numerical value ranging from 0-100. |
meta_data |
data.frame old name for |
meta_data_v2 |
character path to workbook like metadata file, see
|
Remove missing codes from the study data (if defined in the metadata)
Interpretation of variable specific VALUE_LABELS as supplied in the metadata.
Identification of measurements not corresponding to the expected categories. Therefore two output data frames are generated:
on the level of observation to flag each undefined category, and
a summary table for each variable.
Values not corresponding to defined categories are removed in a data frame of modified study data
a list with:
SummaryData: data frame summarizing inadmissible categories with the
columns:
Variables: variable name/label
OBSERVED_CATEGORIES: the categories observed in the study data
DEFINED_CATEGORIES: the categories defined in the metadata
NON_MATCHING: the categories observed but not defined
NON_MATCHING_N: the number of observations with categories not defined
NON_MATCHING_N_PER_CATEGORY: the number of observations for each of the
unexpected categories
GRADING: indicator TRUE/FALSE if inadmissible categorical values were
observed (more than indicated by the threshold_value)
SummaryTable: data frame for the dataquieR pipeline reporting the number
and percentage of inadmissible categorical values
ModifiedStudyData: study data having inadmissible categories removed
FlaggedStudyData: study data having cases with inadmissible categories
flagged
## Not run:
sdt <- data.frame(DIAG = c("B050", "B051", "B052", "B999"),
MED0 = c("S01XA28", "N07XX18", "ABC", NA), stringsAsFactors = FALSE)
mdt <- tibble::tribble(
~ VAR_NAMES, ~ DATA_TYPE, ~ STANDARDIZED_VOCABULARY_TABLE, ~ SCALE_LEVEL, ~ LABEL,
"DIAG", "string", "<ICD10>", "nominal", "Diagnosis",
"MED0", "string", "<ATC>", "nominal", "Medication"
)
con_inadmissible_vocabulary(NULL, sdt, mdt, label_col = LABEL)
prep_load_workbook_like_file("meta_data_v2")
il <- prep_get_data_frame("item_level")
il$STANDARDIZED_VOCABULARY_TABLE[[11]] <- "<ICD10GM>"
il$DATA_TYPE[[11]] <- DATA_TYPES$INTEGER
il$SCALE_LEVEL[[11]] <- SCALE_LEVELS$NOMINAL
prep_add_data_frames(item_level = il)
r <- dq_report2("study_data", dimensions = "con")
r <- dq_report2("study_data", dimensions = "con",
advanced_options = list(dataquieR.non_disclosure = TRUE))
r
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.