sim_uplift: Uplift simulations.

View source: R/sim_uplift.R

sim_upliftR Documentation

Uplift simulations.

Description

Numerical simulations for uplift modeling, as described in Tian et al. (2014).

Usage

sim_uplift(n = 1000, p = 20, rho = 0.2, beta.par = sqrt(6),
  sigma0 = sqrt(2), response = "gaussian")

Arguments

n

The number of observations.

p

The number of predictors.

rho

The correlation between predictors.

beta.par

Size of main effects. See details.

sigma0

Multiplier of error term. See details.

response

The type of response distribution. Possible values are "gaussian" and "binary".

Details

For the gaussian case, sim_uplift simulates data according to the following specification:

y = (β_0 + ∑_{j=1}^p β_{j}X_{j})^2 + (γ_0 + ∑_{j=1}^p γ_{j}X_{j} + 0.8 X_1 X_2) T + σ_{0}ε

where the covariates (X_{1}, …, X_{p}) follow a mean zero multivariate normal distribution with a compound symmetric variance-covariance matrix, (1-ρ)\mathbf{I}_{p} +ρ \mathbf{1}^{'}\mathbf{1}, β_0 = beta.par^-1, β_j = (2 * beta.par)^-1, γ_0 = 0.4, γ_j = (0.8, -0.8, 0.8, -0.8, 0, …, 0), T=[-1,1] is the treatment indicator gerated with equal probability at random, ε is N(0,1), and σ_{0} = sigma0.

For the binary case,

y = I((β_0 + ∑_{j=1}^p β_{j}X_{j})^2 + (γ_0 + ∑_{j=1}^p γ_{j}X_{j} + 0.8 X_1 X_2) T + σ_{0}ε ≥ 0)

For further details, see Tian et al. (2014).

Value

A data frame including the response variable (y), the treatment indicator (T), the "true" uplift effect (trueUplift), and the predictors (X).

Author(s)

Leo Guelman leo.guelman@gmail.com

References

Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532.

Examples


set.seed(324)
df1 <- sim_uplift(p = 30, response = "binary")
str(df1)
df2 <- sim_uplift(n = 10000, p = 20)
str(df2)

leoguelman/uplift2 documentation built on April 15, 2022, 4:34 a.m.