data_preparation: Fixed-effects demeaning and data standardization

View source: R/data_preparation.R

data_preparationR Documentation

Fixed-effects demeaning and data standardization

Description

Prepares a dataset for econometric analysis by applying fixed-effects demeaning (within transformation) and/or standardization to numeric variables. The behavior of the function depends on whether panel identifiers are supplied and whether fixed effects are explicitly requested.

Usage

data_preparation(
  data,
  id = NULL,
  time = NULL,
  fixed_effects = FALSE,
  effect = c("twoway", "section", "time"),
  standardize = FALSE
)

Arguments

data

A data.frame containing the data.

id

An optional character string specifying the cross-sectional (section) identifier. Must be supplied together with time to enable fixed-effects demeaning.

time

An optional character string specifying the time identifier. Must be supplied together with id to enable fixed-effects demeaning.

fixed_effects

Logical. If TRUE, fixed-effects demeaning is applied when both id and time are provided. If FALSE, fixed-effects demeaning is skipped even when identifiers are present.

effect

A character string indicating the fixed-effects structure when fixed_effects = TRUE. One of "twoway", "section", or "time".

standardize

Logical. If TRUE, numeric variables are standardized by subtracting their mean and dividing by their standard deviation. When fixed effects are applied, standardization occurs after demeaning.

Details

If both id and time are provided and fixed_effects = TRUE, the function applies section, time, or two-way fixed-effects demeaning and may optionally standardize the transformed variables. If fixed_effects = FALSE, fixed-effects demeaning is skipped even when identifiers are present, and only standardization (if requested) is applied.

If either id or time is missing, fixed-effects demeaning is not available and the function requires standardize = TRUE.

For two-way fixed effects, the transformation is:

x_{it}^{*} = x_{it} - \bar{x}_{i\cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot\cdot}

Standardization consists of subtracting the mean and dividing by the standard deviation of each variable and is applied after fixed-effects demeaning (if any).

The function operates in three modes:

  • Fixed effects only: fixed_effects = TRUE, standardize = FALSE.

  • Fixed effects + standardization: fixed_effects = TRUE, standardize = TRUE.

  • Standardization only: fixed_effects = FALSE, standardize = TRUE.

When id and time are not provided, only the standardization-only mode is available.

Missing values are ignored when computing means and standard deviations. After fixed-effects demeaning, an intercept term is redundant in subsequent linear regressions.

Value

A data.frame containing only numeric variables used in estimation. Panel identifiers (id, time) are removed from the output. Transformed variables preserve their original column names.

Examples


df <- migration_panel
# Standardization only (panel identifiers present but FE skipped)
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = FALSE,
  standardize = TRUE
)

# Two-way fixed effects with standardization
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = TRUE,
  effect = "twoway",
  standardize = TRUE
)

# Section fixed effects only
X <- data_preparation(
  df,
  id = "Pair_ID",
  time = "Year_0",
  fixed_effects = TRUE,
  effect = "section"
)

# Standardization only (no panel identifiers)
X <- data_preparation(df, standardize = TRUE)



rmsBMA documentation built on March 14, 2026, 5:06 p.m.