sim_uplift | R Documentation |
Numerical simulations for uplift modeling, as described in Tian et al. (2014).
sim_uplift(n = 1000, p = 20, rho = 0.2, beta.par = sqrt(6), sigma0 = sqrt(2), response = "gaussian")
n |
The number of observations. |
p |
The number of predictors. |
rho |
The correlation between predictors. |
beta.par |
Size of main effects. See details. |
sigma0 |
Multiplier of error term. See details. |
response |
The type of response distribution. Possible values are
|
For the gaussian case, sim_uplift
simulates data according to the
following specification:
y = (β_0 + ∑_{j=1}^p β_{j}X_{j})^2 + (γ_0 + ∑_{j=1}^p γ_{j}X_{j} + 0.8 X_1 X_2) T + σ_{0}ε
where the covariates (X_{1}, …, X_{p}) follow a mean zero
multivariate normal distribution with a compound symmetric variance-covariance
matrix, (1-ρ)\mathbf{I}_{p} +ρ \mathbf{1}^{'}\mathbf{1},
β_0 = beta.par
^-1, β_j = (2 * beta.par
)^-1,
γ_0 = 0.4, γ_j = (0.8, -0.8, 0.8, -0.8, 0, …, 0),
T=[-1,1] is the treatment indicator gerated with equal probability at
random, ε is N(0,1), and σ_{0} = sigma0
.
For the binary case,
y = I((β_0 + ∑_{j=1}^p β_{j}X_{j})^2 + (γ_0 + ∑_{j=1}^p γ_{j}X_{j} + 0.8 X_1 X_2) T + σ_{0}ε ≥ 0)
For further details, see Tian et al. (2014).
A data frame including the response variable (y
), the treatment
indicator (T
), the "true" uplift effect (trueUplift
), and the
predictors (X
).
Leo Guelman leo.guelman@gmail.com
Tian, L., Alizadeh, A., Gentles, A. and Tibshirani, R. (2014). "A simple method for detecting interactions between a treatment and a large number of covariates." Journal of the American Statistical Association, 109:508, pp. 1517–1532.
set.seed(324) df1 <- sim_uplift(p = 30, response = "binary") str(df1) df2 <- sim_uplift(n = 10000, p = 20) str(df2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.