xtdml_data_from_data_frame: Wrapper for Double machine learning data-backend...

View source: R/xtdml_data.R

xtdml_data_from_data_frameR Documentation

Wrapper for Double machine learning data-backend initialization from data.frame.

Description

Initalization of DoubleMLData from data.frame.

Usage

xtdml_data_from_data_frame(
  df,
  x_cols = NULL,
  y_col = NULL,
  d_cols = NULL,
  cluster_cols = NULL,
  approach = NULL,
  transformX = NULL
)

Arguments

df

(data.frame())
Data object.

x_cols

(character())
The covariates.

y_col

(character(1))
The outcome variable.

d_cols

(character())
The treatment variable(s).

cluster_cols

(NULL, character())
The cluster variables. Default is NULL.

approach

(character(1))
A character() ("fd-exact", "wg-approx" or "cre") specifying the panel data technique to apply to estimate the causal model. Default is "fd-exact".

transformX

(character(1))
A character() ("no", "minmax" or "poly") specifying the type of transformation to apply to the X data. "no" does not transform the covariates X and is recommended for tree-based learners. "minmax" applies the Min-Max normalization x' = (x-x_{min})/(x_{max}-x_{min}) to the covariates and is recommended with neural networks. "poly" add polynomials up to order three and interactions between all possible combinations of two and three variables; this is recommended for Lasso. Default is "no".

Value

Creates a new instance of class xtdml_data.

Examples


# Generate simulated panel dataset from `xtdml`
data = make_plpr_data(n_obs = 500, t_per = 10, dim_x = 30, theta = 0.5, rho=0.8)

# Set up DML data environment
x_cols  = paste0("X", 1:30)

obj_xtdml_data = xtdml_data_from_data_frame(data,
                x_cols = x_cols,  y_col = "y", d_cols = "d",
                cluster_cols = "id", approach = "fd-exact",
                transformX = "no")

obj_xtdml_data$print()



xtdml documentation built on Sept. 9, 2025, 5:54 p.m.