View source: R/data_preparation.R
| data_preparation | R Documentation |
Prepares a dataset for econometric analysis by applying fixed-effects demeaning (within transformation) and/or standardization to numeric variables. The behavior of the function depends on whether panel identifiers are supplied and whether fixed effects are explicitly requested.
data_preparation(
data,
id = NULL,
time = NULL,
fixed_effects = FALSE,
effect = c("twoway", "section", "time"),
standardize = FALSE
)
data |
A data.frame containing the data. |
id |
An optional character string specifying the cross-sectional
(section) identifier. Must be supplied together with |
time |
An optional character string specifying the time identifier.
Must be supplied together with |
fixed_effects |
Logical. If |
effect |
A character string indicating the fixed-effects structure
when |
standardize |
Logical. If |
If both id and time are provided and fixed_effects = TRUE,
the function applies section, time, or two-way fixed-effects demeaning and may
optionally standardize the transformed variables. If fixed_effects = FALSE,
fixed-effects demeaning is skipped even when identifiers are present, and only
standardization (if requested) is applied.
If either id or time is missing, fixed-effects demeaning is not
available and the function requires standardize = TRUE.
For two-way fixed effects, the transformation is:
x_{it}^{*} = x_{it} - \bar{x}_{i\cdot} - \bar{x}_{\cdot t} + \bar{x}_{\cdot\cdot}
Standardization consists of subtracting the mean and dividing by the standard deviation of each variable and is applied after fixed-effects demeaning (if any).
The function operates in three modes:
Fixed effects only: fixed_effects = TRUE,
standardize = FALSE.
Fixed effects + standardization: fixed_effects = TRUE,
standardize = TRUE.
Standardization only: fixed_effects = FALSE,
standardize = TRUE.
When id and time are not provided, only the standardization-only
mode is available.
Missing values are ignored when computing means and standard deviations. After fixed-effects demeaning, an intercept term is redundant in subsequent linear regressions.
A data.frame containing only numeric variables used in estimation.
Panel identifiers (id, time) are removed from the output.
Transformed variables preserve their original column names.
df <- migration_panel
# Standardization only (panel identifiers present but FE skipped)
X <- data_preparation(
df,
id = "Pair_ID",
time = "Year_0",
fixed_effects = FALSE,
standardize = TRUE
)
# Two-way fixed effects with standardization
X <- data_preparation(
df,
id = "Pair_ID",
time = "Year_0",
fixed_effects = TRUE,
effect = "twoway",
standardize = TRUE
)
# Section fixed effects only
X <- data_preparation(
df,
id = "Pair_ID",
time = "Year_0",
fixed_effects = TRUE,
effect = "section"
)
# Standardization only (no panel identifiers)
X <- data_preparation(df, standardize = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.