com_segment_missingness: Summarizes missingness for individuals in specific segments
In dataquieR: Data Quality in Epidemiological Research

View source: R/com_segment_missingness.R

com_segment_missingness

R Documentation

Summarizes missingness for individuals in specific segments

Description

This implementation can be applied in two use cases:

participation in study segments is not recorded by respective variables, e.g. a participant's refusal to attend a specific examination is not recorded.
participation in study segments is recorded by respective variables.

Use case (1) will be common in smaller studies. For the calculation of segment missingness it is assumed that study variables are nested in respective segments. This structure must be specified in the static metadata. The R-function identifies all variables within each segment and returns TRUE if all variables within a segment are missing, otherwise FALSE.

Use case (2) assumes a more complex structure of study data and metadata. The study data comprise so-called intro-variables (either TRUE/FALSE or codes for non-participation). The column PART_VAR in the metadata is filled by variable-IDs indicating for each variable the respective intro-variable. This structure has the benefit that subsequent calculation of item missingness obtains correct denominators for the calculation of missingness rates.

Descriptor

Usage

com_segment_missingness(
  study_data,
  meta_data,
  group_vars = NULL,
  meta_data_segment,
  strata_vars = NULL,
  label_col,
  threshold_value,
  direction,
  color_gradient_direction,
  expected_observations = c("HIERARCHY", "ALL", "SEGMENT"),
  exclude_roles = c(VARIABLE_ROLES$PROCESS)
)

Arguments

`study_data`	data.frame the data frame that contains the measurements
`meta_data`	data.frame the data frame that contains metadata attributes of study data
`group_vars`	variable the name of a variable used for grouping, defaults to NULL for not grouping output
`meta_data_segment`	data.frame Segment level metadata. Optional.
`strata_vars`	variable the name of a variable used for stratification, defaults to NULL for not grouping output
`label_col`	variable attribute the name of the column in the metadata with labels of variables
`threshold_value`	numeric from=0 to=100. a numerical value ranging from 0-100
`direction`	enum low \| high. "high" or "low", i.e. are deviations above/below the threshold critical. This argument is deprecated and replaced by color_gradient_direction.
`color_gradient_direction`	enum above \| below. "above" or "below", i.e. are deviations above or below the threshold critical? (default: above)
`expected_observations`	enum HIERARCHY \| ALL \| SEGMENT. If ALL, all observations are expected to comprise all study segments. If SEGMENT, the `PART_VAR` is expected to point to a variable with values of 0 and 1, indicating whether the variable was expected to be observed for each data row. If HIERARCHY, this is also checked recursively, so, if a variable points to such a participation variable, and that other variable does has also a `PART_VAR` entry pointing to a variable, the observation of the initial variable is only expected, if both segment variables are 1.
`exclude_roles`	variable roles a character (vector) of variable roles not included

Details

Implementation and use of thresholds

This implementation uses one threshold to discriminate critical from non-critical values. If direction is above than all values below the threshold_value are normal (displayed in dark blue in the plot and flagged with GRADING = 0 in the dataframe). All values above the threshold_value are considered critical. The more they deviate from the threshold the displayed color shifts to dark red. All critical values are highlighted with GRADING = 1 in the summary data frame. By default, highest values are always shown in dark red irrespective of the absolute deviation.

If direction is below than all values above the threshold_value are normal (displayed in dark blue, GRADING = 0).

Hint

This function does not support a resp_vars argument but exclude_roles to specify variables not relevant for detecting a missing segment.

List function.

Value

a list with:

SummaryData: data frame about segment missingness
SummaryPlot: ggplot2 heatmap plot: a heatmap-like graphic that highlights critical values depending on the respective threshold_value and direction.

dataquieR
Data Quality in Epidemiological Research

com_segment_missingness: Summarizes missingness for individuals in specific segments
In dataquieR: Data Quality in Epidemiological Research

Summarizes missingness for individuals in specific segments

Description

This implementation can be applied in two use cases:

Usage

Arguments

Details

Implementation and use of thresholds

Hint

Value

See Also

Related to com_segment_missingness in dataquieR...

R Package Documentation

Browse R Packages

We want your feedback!

dataquieR Data Quality in Epidemiological Research

com_segment_missingness: Summarizes missingness for individuals in specific segments In dataquieR: Data Quality in Epidemiological Research

Summarizes missingness for individuals in specific segments

Description

This implementation can be applied in two use cases:

Usage

Arguments

Details

Implementation and use of thresholds

Hint

Value

See Also

Related to com_segment_missingness in dataquieR...

R Package Documentation

Browse R Packages

We want your feedback!

dataquieR
Data Quality in Epidemiological Research

com_segment_missingness: Summarizes missingness for individuals in specific segments
In dataquieR: Data Quality in Epidemiological Research