simulate_hurdle_data: Simulate Data from Two-Part Mixed Effects Hurdle Demand Model

View source: R/hurdle-simulate.R

simulate_hurdle_dataR Documentation

Simulate Data from Two-Part Mixed Effects Hurdle Demand Model

Description

Generates simulated demand data from the two-part hurdle model. Useful for Monte Carlo simulation studies, power analyses, and model validation.

Usage

simulate_hurdle_data(
  n_subjects = 100,
  prices = seq(0, 11, by = 0.5),
  beta0 = -2,
  beta1 = 1,
  log_q0 = log(10),
  logQ0 = deprecated(),
  k = 2,
  alpha = 0.5,
  sigma_a = 1,
  sigma_b = 0.5,
  sigma_c = 0.1,
  rho_ab = 0.3,
  rho_ac = 0,
  rho_bc = 0,
  sigma_e = 0.3,
  epsilon = 0.001,
  n_random_effects = 2,
  stop_at_zero = TRUE,
  seed = NULL
)

Arguments

n_subjects

Number of subjects to simulate. Default is 100.

prices

Numeric vector of prices at which to simulate consumption. Default is seq(0, 11, by = 0.5).

beta0

Intercept for Part I (logistic). Default is -2.

beta1

Slope for Part I (on log(price + epsilon)). Default is 1.

log_q0

Log of intensity parameter (Q0). Default is log(10), meaning Q0 = 10. To specify Q0 directly, use log_q0 = log(your_Q0).

logQ0

[Deprecated] Use log_q0 instead.

k

Scaling parameter for demand decay. Default is 2.

alpha

Elasticity parameter controlling rate of demand decay. Default is 0.5.

sigma_a

Standard deviation of random intercept for Part I. Default is 1.

sigma_b

Standard deviation of random intercept for Part II. Default is 0.5.

sigma_c

Standard deviation of random slope for alpha (only used if n_random_effects = 3). Default is 0.1.

rho_ab

Correlation between a_i and b_i. Default is 0.3.

rho_ac

Correlation between a_i and c_i. Default is 0.

rho_bc

Correlation between b_i and c_i. Default is 0.

sigma_e

Residual standard deviation. Default is 0.3.

epsilon

Small constant for log(price + epsilon). Default is 0.001.

n_random_effects

Number of random effects (2 or 3). Default is 2.

stop_at_zero

Logical; if TRUE, stop generating observations for a subject once zero consumption is observed. This means subjects will have varying numbers of observations. Set to FALSE to generate all prices for all subjects. Default is TRUE.

seed

Optional random seed for reproducibility.

Details

The simulation follows Zhao et al. (2016):

Part I (Zero vs Positive):

logit(P(Y=0)) = \beta_0 + \beta_1 \cdot \log(price + \epsilon) + a_i

Part II (Positive Consumption):

\log(Y | Y > 0) = (\log Q_0 + b_i) + k \cdot (\exp(-(\alpha + c_i) \cdot price) - 1) + \epsilon

Random effects (a_i, b_i) or (a_i, b_i, c_i) are drawn from a multivariate normal distribution with the specified variances and correlations.

Value

A data frame with columns:

id

Subject identifier

x

Price value

y

Simulated consumption (may include zeros)

delta

Indicator for zero consumption (1 = zero, 0 = positive)

a_i

Subject-specific random effect for Part I

b_i

Subject-specific random effect for Part II

c_i

Subject-specific random effect for alpha (if n_random_effects = 3)

See Also

fit_demand_hurdle, run_hurdle_monte_carlo

Examples

# Simulate with default parameters (2 RE model)
sim_data <- simulate_hurdle_data(n_subjects = 100, seed = 123)
head(sim_data)

# Simulate with custom prices
apt_prices <- c(0, 0.25, 0.5, 1, 1.5, 2, 2.5, 3, 4, 5, 6, 7, 8, 9, 10)
sim_apt <- simulate_hurdle_data(n_subjects = 100, prices = apt_prices, seed = 123)

# Simulate with custom parameters (Q0 = 15, alpha = 0.1)
sim_custom <- simulate_hurdle_data(
  n_subjects = 100,
  log_q0 = log(15),
  alpha = 0.1,
  seed = 123
)

# Simulate 3 RE model
sim_3re <- simulate_hurdle_data(
  n_subjects = 100,
  n_random_effects = 3,
  sigma_c = 0.1,
  seed = 456
)


beezdemand documentation built on March 3, 2026, 9:07 a.m.