View source: R/pnadc-experimental-periods.R
| pnadc_experimental_periods | R Documentation |
Three experimental strategies are available, all properly nested by period:
probabilistic: For narrow ranges (2 possible periods), classifies based on where most of the date interval falls. Assigns only when confidence exceeds threshold.
upa_aggregation: Extends strictly identified periods to other observations in the same UPA-V1014 within the quarter, if a sufficient proportion already have strict identification.
both: Sequentially applies probabilistic strategy first, then UPA aggregation on top. Guarantees identification rate >= max of individual strategies.
pnadc_experimental_periods(
crosswalk,
strategy = c("probabilistic", "upa_aggregation", "both"),
confidence_threshold = 0.9,
upa_proportion_threshold = 0.5,
verbose = TRUE
)
crosswalk |
A crosswalk data.table from |
strategy |
Character specifying which strategy to apply. Options: "probabilistic", "upa_aggregation", "both" |
confidence_threshold |
Numeric (0-1). Minimum confidence required to assign a probabilistic period. Used by probabilistic and combined strategies. Default 0.9. |
upa_proportion_threshold |
Numeric (0-1). Minimum proportion of UPA observations (within quarter) that must have strict identification with consensus for extending to unidentified observations. Default 0.5. |
verbose |
Logical. If TRUE, print progress information. |
Provides experimental strategies for improving period identification rates beyond the standard deterministic algorithm. All strategies respect the nested identification hierarchy: weeks require fortnights, fortnights require months.
All strategies enforce proper nesting:
Fortnights can only be assigned if month is identified (strictly OR experimentally)
Weeks can only be assigned if fortnight is identified (strictly OR experimentally)
For each period type (processed in order: months, then fortnights, then weeks):
Check that the required parent period is identified
If bounds are narrowed to exactly 2 sequential periods, calculate which period contains most of the date interval
Calculate confidence based on the proportion of interval in the likely period (0-1)
Only assign if confidence >= confidence_threshold
For months: aggregates at UPA-V1014 level across all quarters (like strict algorithm) For fortnights and weeks: works at household level within quarter
Extends strictly identified periods based on consensus within geographic groups:
Months: Uses UPA level within quarter
Fortnights/Weeks: Uses UPA level within quarter (all households in same UPA are interviewed in same fortnight/week within a quarter)
Calculate proportion of observations with strictly identified period
If proportion >= upa_proportion_threshold AND consensus exists, extend
Apply in nested order: months first, then fortnights, then weeks
Sequentially applies both strategies to maximize identification:
First, apply the probabilistic strategy (captures observations with narrow date ranges and high confidence)
Then, apply UPA aggregation (extends based on strict consensus within UPA/UPA-V1014 groups)
This guarantees that "both" identifies at least as many observations as either individual strategy alone. The strategies operate independently (UPA aggregation considers only strict identifications), so the result is the union of both strategies.
The output can be passed directly to pnadc_apply_periods() for weight calibration.
The derived columns combine strict and experimental assignments, with strict taking priority. Use the
probabilistic_assignment flag to filter if you only want strict determinations.
A modified crosswalk with additional columns. Output is directly compatible
with pnadc_apply_periods():
ref_month_in_quarter, ref_month_in_year, ref_month_yyyymm:
Month position (combined strict + experimental, strict takes priority)
ref_fortnight_in_month, ref_fortnight_in_quarter, ref_fortnight_yyyyff:
Fortnight position (combined strict + experimental)
ref_week_in_month, ref_week_in_quarter, ref_week_yyyyww:
Week position (combined strict + experimental)
determined_month, determined_fortnight, determined_week:
TRUE if period is assigned (strictly or experimentally)
determined_probable_month, determined_probable_fortnight,
determined_probable_week: TRUE if period was assigned by probabilistic strategy
probabilistic_assignment: TRUE if any period was assigned experimentally
(vs strictly deterministic)
week_1_start, week_1_end, ..., week_4_start, week_4_end:
IBGE week boundaries for the assigned month
These strategies produce "experimental" assignments, not strict determinations.
The standard pnadc_identify_periods() function should be used for
rigorous analysis. Experimental outputs are useful for:
Sensitivity analysis
Robustness checks
Research into identification algorithm improvements
pnadc_identify_periods to build the crosswalk that this function modifies.
pnadc_apply_periods to apply period crosswalk and calibrate weights.
## Not run:
crosswalk <- pnadc_identify_periods(pnadc_data)
crosswalk_exp <- pnadc_experimental_periods(
crosswalk,
strategy = "probabilistic",
confidence_threshold = 0.9
)
crosswalk_exp[, .(
strict = sum(!is.na(ref_month_in_quarter) & !probabilistic_assignment),
experimental = sum(probabilistic_assignment, na.rm = TRUE),
total = sum(determined_month)
)]
result <- pnadc_apply_periods(pnadc_data, crosswalk_exp,
weight_var = "V1028", anchor = "quarter")
strict_only <- crosswalk_exp[
probabilistic_assignment == FALSE | is.na(probabilistic_assignment)
]
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.