Description Usage Arguments Value Examples
identifies the factors with the greatest potential to increase a pre-specified outcome, using varius methods.
1 2 3 4 5 |
Y |
outcome vector (must be numeric without NA's). |
X |
numeric data frame or matrix of factors to be considered. |
control |
numeric data frame or matrix of factors to control for. these are factors that we can't consider while looking for the optimal intervention (e.g. race). |
wgt |
an optional vector of weights. |
method |
the method to be used. either "non-parametric" (default), "correlation" or "nearest-neighbors". |
lambda |
the lagrange multiplier. also known as the shadow price of an intervention. |
sigma |
distance penalty for the nearest-neighbors method. |
grp.size |
for the nearest-neighbors method; if the number of examples in each
control group is smaller than grp.size, performs weight adjustment
using |
n.boot |
number of bootstrap replications to use for the standard errors / confidence intervals calculation. |
sign.factor |
what proportion of quantiles should to be increased (decreased) in order to return a positive (negative) sign? not relevant for the correlation method (there the correlation sign is returned). |
alpha |
significance level for the confidence intervals. |
n.quant |
number of quantiles to use when calculating CDF distance. |
perm.test |
logical. if TRUE (default) performs permutation test and calculates p-values. |
n.perm |
number of permutations for the permutation test. |
quick |
logical. if TRUE, returns only E(X | I=1) - E(X | I=0) as an estimate.
this estimate is used by |
plot |
logical. if TRUE (default), the results are plotted by |
seed |
the seed of the random number generator. |
an object of class "optint". This object is a list containing the folowing components:
estimates |
standardized point estimates (correlations for the correlation method and cdf distances otherwise). |
estimates_sd |
estimates standard deviation. |
details |
a list containing further details, such as: |
Y_diff - E(Y | I=1) - E(Y | I=0).
Y_diff_sd - standard deviation for Y_diff.
method - the method used.
lambda - the lagrange multiplier used.
signs - signs (i.e. directions) for the estimates.
p_value - p-values for the estimates.
ci - a matrix of confidence intervals for the estimates.
stand_factor - the standardization factor used to standardize the results.
kl_distance - the Kullback–Leibler divergence of P(X | I=0) from P(X | I=1).
new_sample - a data frame containing X, control (if provided), wgt (the original weights) and wgt1 (the new weights under I = 1.)
In addition, the function summary
can be used to
print a summary of the results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # generate data
n <- 50
p <- 3
features <- matrix(rnorm(n*p), ncol = p)
men <- matrix(rbinom(n, 1, 0.5), nrow = n)
outcome <- 2*(features[,1] > 1) + men*pmax(features[,2], 0) + rnorm(n)
outcome <- as.vector(outcome)
#find the optimal intervention using the non-parametric method:
imp_feat <- optint(Y = outcome, X = features, control = men,
method = "non-parametric", lambda = 10, plot = TRUE,
n.boot = 100, n.perm = 100)
#by default, only the significant features are displayed
#(see ?plot.optint for further details).
#for customized variable importance plot, use plot():
plot(imp_feat, plot.vars = 3)
#show summary of the results using summary():
summary(imp_feat)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.