validate_pnadc: Validate PNADC Input Data

View source: R/utils-validation.R

validate_pnadcR Documentation

Validate PNADC Input Data

Description

Checks that input data has required columns for the specified processing.

Usage

validate_pnadc(data, check_weights = FALSE, stop_on_error = TRUE)

Arguments

data

A data.frame or data.table with PNADC microdata

check_weights

Logical. If TRUE, also check for weight-related variables.

stop_on_error

Logical. If TRUE, stops with an error. If FALSE, returns a validation report list.

Details

The function performs the following validations:

  • Checks for required columns for reference period identification: Ano, Trimestre, UPA, V1008, V1014, V2008, V20081, V20082, V2009

  • Validates year range (2012-2100 for PNADC coverage)

  • Validates quarter values (must be 1-4)

  • Validates birth day values (must be 1-31 or 99 for unknown)

  • Validates birth month values (must be 1-12 or 99 for unknown)

  • Warns about unusual ages (outside 0-130 range)

  • If check_weights = TRUE, also validates weight-related columns: V1028, UF, posest, posest_sxi

Value

If stop_on_error = TRUE, returns invisibly if valid or stops with error. If stop_on_error = FALSE, returns a list with:

  • valid: Logical indicating if data passed all validations

  • issues: Named list of validation issues found (empty if none)

  • n_rows: Number of rows in input data

  • n_cols: Number of columns in input data

  • join_keys_available: Character vector of available join key columns

See Also

pnadc_identify_periods which calls this function internally to validate input data.

Examples

# Minimal valid data (all 9 required columns)
sample_data <- data.frame(
  Ano = 2023L, Trimestre = 1L, UPA = 110000001L,
  V1008 = 1L, V1014 = 1L,
  V2008 = 15L, V20081 = 3L, V20082 = 1990L, V2009 = 33L
)
validate_pnadc(sample_data)

# Data with missing columns returns issues (non-stop mode)
incomplete_data <- data.frame(Ano = 2023L, Trimestre = 1L)
result <- validate_pnadc(incomplete_data, stop_on_error = FALSE)
result$valid    # FALSE
result$issues   # lists missing columns


PNADCperiods documentation built on April 28, 2026, 9:07 a.m.