knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, purl = FALSE )
Brazil's Continuous National Household Sample Survey (PNADC) publishes labor market indicators as rolling (moving) quarters — 3-month moving averages where each published "quarter" shares 2 months with its neighbors. This smoothing hides short-term dynamics: turning points are delayed, seasonal patterns are distorted, and international comparison becomes difficult.
The PNADCperiods package includes a SIDRA mensalization module that recovers exact monthly estimates from rolling quarter data. This vignette explains how to use it.
Each published "quarter" is actually a 3-month moving average:
{width=100%}
When unemployment jumps sharply in a single month, the rolling quarter spreads that spike across multiple overlapping periods. The mensalization algorithm inverts this averaging process to recover the true monthly values.
library(PNADCperiods) # Step 1: Fetch rolling quarter data from SIDRA API rolling_quarters <- fetch_sidra_rolling_quarters() # Step 2: Convert to monthly estimates monthly <- mensalize_sidra_series(rolling_quarters) # Step 3: Use your monthly data! head(monthly[, .(anomesexato, m_popocup, m_taxadesocup)])
That's it! You now have monthly estimates starting from January 2012.
fetch_sidra_rolling_quarters() downloaded 86+ economic indicators
from IBGE's SIDRA API
mensalize_sidra_series() applied the mensalization formula using
pre-computed starting points (bundled with the package)
The result is a data.table with one row per month and m_* columns
for each mensalized series
The mensalized output contains:
anomesexato: Month identifier (YYYYMM format, e.g., 201903 = March 2019)m_* columns: Mensalized (monthly) estimates for each seriesipca100dez1993, inpc100dez1993 (passed through for deflation)Key series include:
| Column | Description | Unit |
|--------|-------------|------|
| m_populacao | Total population | Thousands |
| m_pop14mais | Population 14+ years | Thousands |
| m_popocup | Employed population | Thousands |
| m_popdesocup | Unemployed population | Thousands |
| m_taxadesocup | Unemployment rate | Percent |
| m_taxapartic | Labor force participation rate | Percent |
| m_massahabnominaltodos | Total nominal wage bill | Millions R$ |
Rate series (like m_taxadesocup) are derived from mensalized level
series when compute_derived = TRUE (the default). They are computed as
ratios of the mensalized levels, not directly mensalized from the rolling
quarter rates.
Use get_sidra_series_metadata() to explore all 86+ available series:
meta <- get_sidra_series_metadata() # View series organized by theme meta[, .N, by = .(theme, theme_category)] # Filter to specific theme categories meta[theme_category == "employment_type", .(series_name, description)]
The metadata uses a hierarchical taxonomy: theme (top level, e.g.,
"labor_market"), theme_category (e.g., "employment_type"), and optionally
subcategory (e.g., "levels", "rates").
The mensalization process follows a three-step pipeline:
{width=100%}
fetch_sidra_rolling_quarters() downloads data from five SIDRA tables:
| Table | Content | |-------|---------| | 4093 | Population and labor force | | 6390 | Income (nominal and real) | | 6392 | Real income by occupation | | 6399 | Employment by sector | | 6906 | Underutilization indicators |
rq <- fetch_sidra_rolling_quarters(verbose = TRUE) # Inspect structure dim(rq) names(rq)[1:20]
Key columns: anomesfinaltrimmovel (end month of rolling quarter, YYYYMM),
mesnotrim (month position 1/2/3), plus one column per series.
monthly <- mensalize_sidra_series(rq, verbose = TRUE) # Compare dimensions cat("Rolling quarters:", nrow(rq), "rows\n") cat("Monthly data:", nrow(monthly), "rows\n")
The row count is approximately the same (one per month), but the meaning changes from "rolling quarter ending in month X" to "exact estimate for month X".
Show plotting code
# --- VIGNETTE CODE: plot-unemployment --- library(ggplot2) monthly[, date := as.Date(paste0(substr(anomesexato, 1, 4), "-", substr(anomesexato, 5, 6), "-01"))] ggplot(monthly, aes(x = date, y = m_taxadesocup)) + geom_line(color = "#1976D2", linewidth = 0.8) + labs(title = "Monthly Unemployment Rate", x = NULL, y = "Unemployment Rate (%)")
For analyses requiring monthly population estimates separately:
pop <- fetch_monthly_population() head(pop)
Returns a data.table with ref_month_yyyymm and m_populacao columns.
Instead of fetching all 86+ series, filter by theme or theme category:
# Only employment type series employment <- fetch_sidra_rolling_quarters(theme_category = "employment_type") # Only wage mass series wages <- fetch_sidra_rolling_quarters(theme_category = "wage_mass") # Only labor market theme (includes participation, unemployment, employment types, etc.) labor <- fetch_sidra_rolling_quarters(theme = "labor_market")
For maximum efficiency, request only the series you need:
# Only unemployment-related series unemp <- fetch_sidra_rolling_quarters( series = c("popdesocup", "taxadesocup", "popnaforca") )
Some series are rates computed from other series. To fetch only "base" series:
# Exclude computed rates (only population and income levels) base_only <- fetch_sidra_rolling_quarters(exclude_derived = TRUE)
After mensalization, select columns as needed:
monthly <- mensalize_sidra_series(rq) # Select specific series labor_market <- monthly[, .( anomesexato, employed = m_popocup, unemployed = m_popdesocup, unemp_rate = m_taxadesocup, participation = m_taxapartic )]
This section can be skipped by users who just need results.
Rolling quarters are 3-month moving averages. If we denote the true monthly value for month $t$ as $y_t$, then the rolling quarter value $x_t$ is:
$$x_t = \frac{y_{t-2} + y_{t-1} + y_t}{3}$$
The mensalization algorithm inverts this relationship to recover $y_t$ from the sequence of $x_t$ values.
Step 1: Compute first differences
$$d3_t = x_t - x_{t-1}$$
Step 2: Identify month position (mesnotrim)
Each month has a position within its quarter: - Position 1: Jan, Apr, Jul, Oct - Position 2: Feb, May, Aug, Nov - Position 3: Mar, Jun, Sep, Dec
Step 3: Cumulative sum by position
For each position separately, compute the cumulative sum of first differences, starting from a calibrated "starting point" $y_0$:
$$y_t = y_0 + \sum_{s \in \text{same position}, s \leq t} d3_s$$
{width=100%}
The starting point $y_0$ is crucial. It determines the level of all subsequent monthly estimates. The package includes pre-computed starting points for 53 series, calibrated during the stable 2013-2019 period.
Starting points are computed by:
The package caches SIDRA API responses in memory during your R session:
# First call: fetches from API (~10 seconds) rq1 <- fetch_sidra_rolling_quarters() # Second call with use_cache = TRUE: uses cached data (instant) rq2 <- fetch_sidra_rolling_quarters(use_cache = TRUE) # Clear all cached data (force fresh fetch on next call) clear_sidra_cache()
The cache persists until you call clear_sidra_cache() or restart R.
| Error | Cause | Solution |
|-------|-------|----------|
| "Series not found" | Misspelled series name | Check get_sidra_series_metadata() |
| "API timeout" | SIDRA server slow | Retry; use use_cache = TRUE |
| "No starting points" | Custom series | See Custom Starting Points below |
# Check if series exists meta <- get_sidra_series_metadata() "taxadesocup" %in% meta$series_name # TRUE
COVID-19 disruptions (2020): IBGE suspended in-person interviews during the pandemic. Some indicators show unusual patterns in 2020-Q2.
CNPJ series availability: Series based on CNPJ registration (empregadorcomcnpj, contapropriacomcnpj, etc.) are only available from October 2015, when V4019 was introduced.
For users with calibrated PNADC microdata.
Use the bundled starting points (default) unless:
# Load your stacked PNADC microdata (with pnadc_apply_periods weights) stacked <- readRDS("my_calibrated_pnadc.rds") # Compute starting points custom_y0 <- compute_starting_points_from_microdata( data = stacked, calibration_start = 201301L, calibration_end = 201912L, verbose = TRUE ) # Use custom starting points monthly <- mensalize_sidra_series(rq, starting_points = custom_y0)
# Step 1: Build crosswalk and calibrate crosswalk <- pnadc_identify_periods(stacked) calibrated <- pnadc_apply_periods( stacked, crosswalk, weight_var = "V1028", anchor = "quarter", calibration_unit = "month" ) # Step 2: Compute z_ aggregates (monthly totals from microdata) z_agg <- compute_z_aggregates(calibrated) # Step 3: Fetch rolling quarters for comparison rq <- fetch_sidra_rolling_quarters() # Step 4: Compute starting points y0 <- compute_series_starting_points( monthly_estimates = z_agg, rolling_quarters = rq, calibration_start = 201301L, calibration_end = 201912L ) # Step 5: Use custom starting points result <- mensalize_sidra_series(rq, starting_points = y0)
CNPJ-based series automatically use a later calibration period (2016-2019)
when use_series_specific_periods = TRUE (the default in
compute_series_starting_points()).
bundled <- pnadc_series_starting_points # Merge and compare comp <- merge(custom_y0, bundled, by = c("series_name", "mesnotrim"), suffixes = c("_custom", "_bundled")) comp[, rel_diff := abs(y0_custom - y0_bundled) / abs(y0_bundled) * 100] comp[rel_diff > 1] # Flag series with >1% difference
How quickly did unemployment rise when COVID-19 hit Brazil? Rolling quarter data obscures these dynamics. Monthly estimates reveal the exact timing.
Show analysis code
# --- VIGNETTE CODE: covid-analysis --- # Fetch all series and mensalize rq <- fetch_sidra_rolling_quarters() monthly <- mensalize_sidra_series(rq) # Filter to COVID period covid_period <- monthly[anomesexato >= 201901 & anomesexato <= 202212] # Create date column covid_period[, date := as.Date(paste0( substr(anomesexato, 1, 4), "-", substr(anomesexato, 5, 6), "-01" ))] # Find peak peak_month <- covid_period[which.max(m_taxadesocup)] cat("Peak unemployment:", peak_month$m_taxadesocup, "% in", format(peak_month$date, "%B %Y"), "\n")
{width=100%}
{width=100%}
Key findings from monthly estimates:
Exact peak timing: Monthly data pinpoints the peak month, while rolling quarters show only a gradual rise
Speed of impact: The monthly series reveals a sharp spike that rolling quarters smooth over 3+ months
Recovery dynamics: Monthly estimates show pauses and reversals in recovery that are hidden in quarterly averages
| Pattern | Meaning | Example |
|---------|---------|---------|
| m_ | Mensalized monthly estimate | m_popocup |
| pop* | Population count | populacao, pop14mais |
| *comcart | With formal contract | empregprivcomcart |
| *semcart | Without formal contract | empregprivsemcart |
| *comcnpj | With CNPJ registration | empregadorcomcnpj |
| taxa* | Rate (percent) | taxadesocup |
| nivel* | Level/ratio (percent) | nivelocup |
| rend* | Income (rendimento) | rendhabnominaltodos |
| massa* | Wage bill (massa salarial) | massahabnominaltodos |
| *hab* | Usually received (habitual) | rendhabnominaltodos |
| *efet* | Actually received (efetivo) | rendefetnominaltodos |
For the complete catalog, use get_sidra_series_metadata():
meta <- get_sidra_series_metadata() # Filter by theme category meta[theme_category == "employment_type", .(series_name, description)] # Filter by theme and pattern meta[theme == "labor_market" & grepl("taxa|nivel", series_name), .(series_name, description)]
| Function | Purpose |
|----------|---------|
| fetch_sidra_rolling_quarters() | Download rolling quarter data from SIDRA API |
| fetch_monthly_population() | Get monthly population estimates |
| mensalize_sidra_series() | Convert rolling quarters to monthly estimates |
| get_sidra_series_metadata() | Explore available series and metadata |
| clear_sidra_cache() | Clear cached API data |
| compute_z_aggregates() | Compute monthly aggregates from calibrated microdata |
| compute_series_starting_points() | Compute $y_0$ values from aggregates |
| compute_starting_points_from_microdata() | All-in-one $y_0$ computation |
Bundled data: pnadc_series_starting_points — pre-computed $y_0$ for
53 series x 3 month positions (calibration period: 2013-2019).
vignette("getting-started") — Setting up PNADC microdata analysisvignette("how-it-works") — The period identification algorithmvignette("applied-examples") — Applied research examplesAny scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.