data_preparation: Fixed-effects demeaning and data standardization
In rmsBMA: Reduced Model Space Bayesian Model Averaging

data_preparation

R Documentation

Fixed-effects demeaning and data standardization

Description

Prepares a dataset for econometric analysis by applying fixed-effects demeaning (within transformation) and/or standardization to numeric variables. The behavior of the function depends on whether panel identifiers are supplied and whether fixed effects are explicitly requested.

Usage

data_preparation(
  data,
  id = NULL,
  time = NULL,
  fixed_effects = FALSE,
  effect = c("twoway", "section", "time"),
  standardize = FALSE
)

Arguments

`data`	A data.frame containing the data.
`id`	An optional character string specifying the cross-sectional (section) identifier. Must be supplied together with `time` to enable fixed-effects demeaning.
`time`	An optional character string specifying the time identifier. Must be supplied together with `id` to enable fixed-effects demeaning.
`fixed_effects`	Logical. If `TRUE`, fixed-effects demeaning is applied when both `id` and `time` are provided. If `FALSE`, fixed-effects demeaning is skipped even when identifiers are present.
`effect`	A character string indicating the fixed-effects structure when `fixed_effects = TRUE`. One of `"twoway"`, `"section"`, or `"time"`.
`standardize`	Logical. If `TRUE`, numeric variables are standardized by subtracting their mean and dividing by their standard deviation. When fixed effects are applied, standardization occurs after demeaning.

Details

If both id and time are provided and fixed_effects = TRUE, the function applies section, time, or two-way fixed-effects demeaning and may optionally standardize the transformed variables. If fixed_effects = FALSE, fixed-effects demeaning is skipped even when identifiers are present, and only standardization (if requested) is applied.

If either id or time is missing, fixed-effects demeaning is not available and the function requires standardize = TRUE.

For two-way fixed effects, the transformation is:

x_{it}^{*} = x_{it} - \bar{x}_{i\cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot\cdot}

Standardization consists of subtracting the mean and dividing by the standard deviation of each variable and is applied after fixed-effects demeaning (if any).

The function operates in three modes:

Fixed effects only: fixed_effects = TRUE, standardize = FALSE.
Fixed effects + standardization: fixed_effects = TRUE, standardize = TRUE.
Standardization only: fixed_effects = FALSE, standardize = TRUE.

When id and time are not provided, only the standardization-only mode is available.

Missing values are ignored when computing means and standard deviations. After fixed-effects demeaning, an intercept term is redundant in subsequent linear regressions.

Value

A data.frame containing only numeric variables used in estimation. Panel identifiers (id, time) are removed from the output. Transformed variables preserve their original column names.

Examples


df <- migration_panel
# Standardization only (panel identifiers present but FE skipped)
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = FALSE,
  standardize = TRUE
)

# Two-way fixed effects with standardization
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = TRUE,
  effect = "twoway",
  standardize = TRUE
)

# Section fixed effects only
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = TRUE,
  effect = "section"
)

# Standardization only (no panel identifiers)
X <- data_preparation(df, standardize = TRUE)

rmsBMA documentation built on March 14, 2026, 5:06 p.m.