make_ssm_data: Generates data from a sample selection model (SSM).

View source: R/datasets.R

make_ssm_dataR Documentation

Generates data from a sample selection model (SSM).

Description

The data generating process is defined as:

Usage

make_ssm_data(
  n_obs = 8000,
  dim_x = 100,
  theta = 1,
  mar = TRUE,
  return_type = "DoubleMLData"
)

Arguments

n_obs

(integer(1))
The number of observations to simulate.

dim_x

(integer(1))
The number of covariates.

theta

(numeric(1))
The value of the causal parameter.

mar

(logical(1))
Indicates whether missingness at random holds.

return_type

(character(1))
If "DoubleMLData", returns a DoubleMLData object. If "data.frame" returns a data.frame(). If "data.table" returns a data.table(). Default is "DoubleMLData".

Details

y_i = \theta d_i + x_i' \beta + u_i,

s_i = 1\lbrace d_i + \gamma z_i + x_i' \beta + v_i > 0 \rbrace,

d_i = 1\lbrace x_i' \beta + w_i > 0 \rbrace,

with y_i being observed if s_i = 1 and covariates x_i \sim \mathcal{N}(0, \Sigma^2_x), where \Sigma^2_x is a matrix with entries \Sigma_{kj} = 0.5^{|j-k|}. \beta is a dim_x-vector with entries \beta_j=\frac{0.4}{j^2} z_i \sim \mathcal{N}(0, 1), (u_i,v_i) \sim \mathcal{N}(0, \Sigma^2_{u,v}), w_i \sim \mathcal{N}(0, 1).

The data generating process is inspired by a process used in the simulation study (see Appendix E) of Bia, Huber and Lafférs (2023).

Value

Depending on the return_type, returns an object or set of objects as specified.

References

Michela Bia, Martin Huber & Lukáš Lafférs (2023) Double Machine Learning for Sample Selection Models, Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2023.2271071


DoubleML documentation built on April 12, 2025, 1:15 a.m.