acc_distributions: Plots and checks for distributions

View source: R/acc_distributions.R

acc_distributionsR Documentation

Plots and checks for distributions

Description

Data quality indicator checks "Unexpected location" and "Unexpected proportion" with histograms and, if a grouping variable is included, plots of empirical cumulative distributions for the subgroups.

Usage

acc_distributions(
  resp_vars = NULL,
  group_vars = NULL,
  study_data,
  meta_data,
  label_col,
  check_param = c("any", "location", "proportion"),
  plot_ranges = TRUE,
  flip_mode = "noflip"
)

Arguments

resp_vars

variable list the names of the measurement variables

group_vars

variable list the name of the observer, device or reader variable

study_data

data.frame the data frame that contains the measurements

meta_data

data.frame the data frame that contains metadata attributes of study data

label_col

variable attribute the name of the column in the metadata with labels of variables

check_param

enum any | location | proportion. Which type of check should be conducted (if possible): a check on the location of the mean or median value of the study data, a check on proportions of categories, or either of them if the necessary metadata is available.

plot_ranges

logical Should the plot show ranges and results from the data quality checks? (default: TRUE)

flip_mode

enum default | flip | noflip | auto. Should the plot be in default orientation, flipped, not flipped or auto-flipped. Not all options are always supported. In general, this con be controlled by setting the roptions(dataquieR.flip_mode = ...). If called from dq_report, you can also pass flip_mode to all function calls or set them specifically using specific_args.

Value

A list with:

  • SummaryTable: data.frame containing data quality checks for "Unexpected location" (FLG_acc_ud_loc) and "Unexpected proportion" (FLG_acc_ud_prop) for each response variable in resp_vars.

  • SummaryData: a data.frame containing data quality checks for "Unexpected location" and / or "Unexpected proportion" for a report

  • SummaryPlotList: list of ggplots for each response variable in resp_vars.

Algorithm of this implementation:

  • If no response variable is defined, select all variables of type float or integer in the study data.

  • Remove missing codes from the study data (if defined in the metadata).

  • Remove measurements deviating from (hard) limits defined in the metadata (if defined).

  • Exclude variables containing only NA or only one unique value (excluding NAs).

  • Perform check for "Unexpected location" if defined in the metadata (needs a LOCATION_METRIC (mean or median) and LOCATION_RANGE (range of expected values for the mean and median, respectively)).

  • Perform check for "Unexpected proportion" if defined in the metadata (needs PROPORTION_RANGE (range of expected values for the proportions of the categories)).

  • Plot histogram(s).

  • If group_vars is specified by the user, distributions within group-wise ecdf are presented.

See Also

Online Documentation


dataquieR documentation built on July 26, 2023, 6:10 p.m.