data.gen.cs: Generate Cross-Sectional Data for Stochastic Frontier...
In sfa: Stochastic Frontier Analysis

data_gen_cs

R Documentation

Generate Cross-Sectional Data for Stochastic Frontier Analysis

Description

data_gen_cs generates simulated cross-sectional data based on the stochastic frontier model, allowing for different distributional assumptions for the one-sided technical inefficiency error term (u) and the two-sided idiosyncratic error term (v). The model has the general form: Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + v - u where u \geq 0 and represents inefficiency. All variants are produced so that the user can select those that they want.

Usage

data_gen_cs(N, rand, sig_u, sig_v, cons, beta1, beta2, a, mu)

Arguments

`N`	A single integer specifying the number of observations (cross-sectional units).
`rand`	A single integer to set the seed for the random number generator, ensuring reproducibility.
`sig_u`	The standard deviation parameter (`\sigma_u`) for the base distribution of the one-sided error term `u`.
`sig_v`	The standard deviation parameter (`\sigma_v`) for the base distribution of the two-sided error term `v`.
`cons`	The value of the constant term (intercept) in the model.
`beta1`	The coefficient for the `x_1` variable.
`beta2`	The coefficient for the `x_2` variable.
`a`	The degrees of freedom parameter for the t half-t distribution (`u_t` and `v_t`, respectively). Requires the `rt` function.
`mu`	The mean parameter (`\mu`) for the normal truncated normal distribution (`u_tn`). Requires the `rtruncnorm` function.

Details

The function simulates two explanatory variables, x_1 and x_2, as transformations of uniform random variables.

The function generates several different frontier models by combining various distributions for u and v:

**u Distributions (Inefficiency):** Half-Normal (HN), Truncated Normal (TN), Half-T (HT), Half-Cauchy (HC), Exponential (E), Half-Uniform (HU).
**v Distributions (Idiosyncratic):** Normal (N), t, Cauchy (C).

**Specific Model Outputs (y_pcs variants):**

y_pcs: Normal-Half Normal (N-HN): v \sim N(0, \sigma_v^2), u \sim |N(0, \sigma_u^2)|.
y_pcs_z: N-HN with Heteroskedastic \sigma_u: \sigma_{u,i} = \exp(0.9 + 0.6 Z_i), where Z is a uniform variable.
y_pcs_t: T-Half T (T-HT): v \sim T(\text{df}=a) \cdot \sigma_v, u \sim |T(\text{df}=a)| \cdot \sigma_u.
y_pcs_tn: Normal-Truncated Normal (N-TN): v \sim N(0, \sigma_v^2), u \sim TN(\mu, \sigma_u^2) on [0, \infty).
y_pcs_e: Normal-Exponential (N-E): v \sim N(0, \sigma_v^2), u \sim Exp(\phi), where \phi = 1/\sigma_u.
y_pcs_c: Cauchy-Half Cauchy (C-HC): v \sim Cauchy(0, \sigma_v), u \sim |Cauchy(0, \sigma_u)|.
y_pcs_u: Normal-Half Uniform (N-HU): v \sim N(0, \sigma_v^2), u \sim U(0, \sigma_u).
y_pcs_w: Normal + Cauchy - Half Normal: v \sim N(0, \sigma_v^2) + Cauchy(0, \sigma_v), u \sim |N(0, \sigma_u^2)|. This introduces a composite v term.

**Note:** The rtruncnorm function is required for y_pcs_tn and loads with the package. In isolation it could be loaded by using library(truncnorm).

Value

A data frame containing N observations with the following columns:

`name`	Individual identifier (simply `1` to `N`).
`cons`	The constant term value.
`x1`	Simulated explanatory variable `x_1`.
`x2`	Simulated explanatory variable `x_2`.
`u`, `uz`, `u_t`, `u_c`, `u_e`, `u_u`, `u_tn`	The simulated one-sided error terms under different distributions.
`v`, `v_t`, `v_c`	The simulated two-sided error terms under different distributions.
`y_pcs`, `y_pcs_t`, `y_pcs_e`, `y_pcs_c`, `y_pcs_u`, `y_pcs_z`, `y_pcs_w`, `y_pcs_tn`	The dependent variable `Y` under the corresponding SFA model distributions.
`z`	The auxiliary variable used for heteroskedasticity in `y_pcs_z`.
`con`	A constant column set to 1, potentially for use in estimation.

Author(s)

David Bernstein

Examples


# Generate 100 observations of SFA data
data_sfa <- data_gen_cs(
  N     = 100,
  rand  = 123,
  sig_u = 0.5,
  sig_v = 0.2,
  cons  = 5,
  beta1 = 1.5,
  beta2 = 2.0,
  a     = 5,   # degrees of freedom for T/Half-T
  mu    = 0.1  # mean for Truncated Normal
)

# Display the first few rows of the generated data
head(data_sfa)

# Example of a Normal-Half Normal SFA model data
summary(data_sfa$y_pcs)
plot(density(data_sfa$y_pcs))

sfa documentation built on Jan. 22, 2026, 1:08 a.m.