| check_missing_segments | R Documentation |
Analyzes data completeness at the segment level. A segment is a group of variables defined in the 'Segment_Names' column of the metadata.
check_missing_segments(S_data, M_data, Show_Plot = FALSE)
S_data |
A data frame containing the source data to be checked. |
M_data |
A metadata data frame containing the validation rules, including a 'Segment_Names' column. |
Show_Plot |
A logical value. If 'TRUE', a stacked bar chart visualizing the proportions for each segment is displayed. |
For each segment, this function evaluates every row of the source data ('S_data') and classifies it into one of three categories:
**Complete:** The row has all values as non-missing for any variable within the segment.
**Incomplete:** The row has at least one 'NA' value for variables in the segment.
**Fully Missing:** All variables belonging to the segment are 'NA' for that row.
The metadata ('M_data') must contain the following columns to define the rules:
**VARIABLE:** The name of the variable in the source data ('S_data') to be checked for missingness.
**VARIABLE_Code:** A unique numeric or character code assigned to each variable for identification and dependency mapping.
**Dependency:** Specifies the dependency of the variable on another variable. A value of '0' indicates no dependency, while other values indicate the 'VARIABLE_Code' of the parent variable.
**Dep_Value:** The specific value or condition of the parent variable (as referenced in 'Dependency') that must be met for the current variable to be applicable. Use '"ANY"' if the value of the parent variable can be any non-missing value.
The function returns a summary table with counts and percentages for each category per segment.
A 'data.frame' summarizing the analysis for each segment, with columns: 'SEGMENT', 'Total_Rows', 'Complete_Count', 'Incomplete_Count', 'Missing_Count', 'Percent_Complete', 'Percent_Incomplete', and 'Percent_Missing'.
Other missing data checks:
check_missing_itemwise(),
check_missing_record()
# 1. Define comprehensive sample data and metadata
Meta_data <- data.frame(
stringsAsFactors = FALSE,
VARIABLE = c(
"ID", "Gender", "Age", "Has_Job", "Job_Title",
"Job_Satisfaction", "Last_Promotion_Year", "Has_Insurance",
"Insurance_Provider", "Annual_Checkup"
),
VARIABLE_Code = 1:10,
Var_order = 1:10,
Segment_Names = c(
"Demographic", "Demographic", "Demographic", "Employment", "Employment",
"Employment", "Employment", "Health", "Health", "Health"
),
Dependency = c(0, 0, 0, 0, 4, 5, 5, 0, 8, 8),
Dep_Value = c(
"0", "0", "0", "0", "Yes", "ANY", "ANY", "0", "Yes", "Yes"
)
)
Source_data <- data.frame(
ID = 1:10,
Gender = c("Male", NA, "Male", "Female", "Male","Female", "Male", "Female", "Male", "Female"),
Age = c(25, NA, 31, 55, 29, 38, 45, 22, 60, 33),
Has_Job = c("Yes", NA, "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes"),
Job_Title = c(NA, NA, NA, "Analyst", NA, "Student","Director", "Engineer", NA, "Designer"),
Job_Satisfaction = c(5, NA, NA, 8, 7, NA, 10, 9, NA, 6),
Last_Promotion_Year = c(2020,NA , 2021, NA, NA, NA, 2024, 2022, NA, 2023),
Has_Insurance = c("Yes", NA, "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes"),
Insurance_Provider = c("Provider A", NA, "Provider B", "Provider C","Provider D", NA, "Provider E",
NA, NA, "Provider F"),
Annual_Checkup = c("Yes", NA, "No", "Yes", NA, "Yes", "Yes", "No", NA, "Yes")
)
# 3. Run the segment check with plot
segment_report <- check_missing_segments(
S_data = Source_data, M_data = Meta_data, Show_Plot = TRUE
)
print(segment_report)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.