View source: R/utils-validation.R
| validate_pnadc | R Documentation |
Checks that input data has required columns for the specified processing.
validate_pnadc(data, check_weights = FALSE, stop_on_error = TRUE)
data |
A data.frame or data.table with PNADC microdata |
check_weights |
Logical. If TRUE, also check for weight-related variables. |
stop_on_error |
Logical. If TRUE, stops with an error. If FALSE, returns a validation report list. |
The function performs the following validations:
Checks for required columns for reference period identification:
Ano, Trimestre, UPA, V1008, V1014,
V2008, V20081, V20082, V2009
Validates year range (2012-2100 for PNADC coverage)
Validates quarter values (must be 1-4)
Validates birth day values (must be 1-31 or 99 for unknown)
Validates birth month values (must be 1-12 or 99 for unknown)
Warns about unusual ages (outside 0-130 range)
If check_weights = TRUE, also validates weight-related columns:
V1028, UF, posest, posest_sxi
If stop_on_error = TRUE, returns invisibly if valid or stops with error.
If stop_on_error = FALSE, returns a list with:
valid: Logical indicating if data passed all validations
issues: Named list of validation issues found (empty if none)
n_rows: Number of rows in input data
n_cols: Number of columns in input data
join_keys_available: Character vector of available join key columns
pnadc_identify_periods which calls this function
internally to validate input data.
# Minimal valid data (all 9 required columns)
sample_data <- data.frame(
Ano = 2023L, Trimestre = 1L, UPA = 110000001L,
V1008 = 1L, V1014 = 1L,
V2008 = 15L, V20081 = 3L, V20082 = 1990L, V2009 = 33L
)
validate_pnadc(sample_data)
# Data with missing columns returns issues (non-stop mode)
incomplete_data <- data.frame(Ano = 2023L, Trimestre = 1L)
result <- validate_pnadc(incomplete_data, stop_on_error = FALSE)
result$valid # FALSE
result$issues # lists missing columns
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.