View source: R/median.inclusion.vs.R
medianInclusion.vs | R Documentation |
This function implements the variable selection approach proposed in Linero (2018). Linero (2018) proposes DART, a variant of BART, which replaces the discrete uniform distribution for selecting a split variable with a categorical distribution of which the event probabilities follow a Dirichlet distribution. DART estimates the marginal posterior variable inclusion probability (MPVIP) for a predictor by the proportion of the posterior samples of the trees structures where the predictor is used as a split variable at least once, and selects predictors with MPVIP at least 0.5, yielding a median probability model.
medianInclusion.vs( x.train, y.train, probit = FALSE, vip.selection = TRUE, true.idx = NULL, plot = FALSE, num.var.plot = Inf, theta = 0, omega = 1, a = 0.5, b = 1, augment = FALSE, rho = NULL, xinfo = matrix(0, 0, 0), numcut = 100L, usequants = FALSE, cont = FALSE, rm.const = TRUE, power = 2, base = 0.95, split.prob = "polynomial", k = 2, ntree = 20L, ndpost = 1000L, nskip = 1000L, keepevery = 1L, printevery = 100L, verbose = FALSE )
x.train |
A matrix or a data frame of predictors values with each row corresponding to an observation and each column corresponding to a predictor. If a predictor is a factor with q levels in a data frame, it is replaced with q dummy variables. |
y.train |
A vector of response (continuous or binary) values. |
probit |
A Boolean argument indicating whether the response variable is binary or continuous; |
vip.selection |
A Boolean argument indicating whether to select predictors using BART VIPs. |
true.idx |
(Optional) A vector of indices of the true relevant predictors; if provided, metrics including precision, recall and F1 score are returned. |
plot |
(Optional) A Boolean argument indicating whether plots are returned or not. |
num.var.plot |
The number of variables to be plotted. |
theta |
Set |
omega |
Set |
a |
A sparse parameter of Beta(a, b) hyper-prior where 0.5<=a<=1; a lower value induces more sparsity. |
b |
A sparse parameter of Beta(a, b) hyper-prior; typically, b=1. |
augment |
A Boolean argument indicating whether data augmentation is performed in the variable selection procedure of Linero (2018). |
rho |
A sparse parameter; typically ρ = p where p is the number of predictors. |
xinfo |
A matrix of cut-points with each row corresponding to a predictor and each column corresponding to a cut-point.
|
numcut |
The number of possible cut-points; If a single number is given, this is used for all predictors;
Otherwise a vector with length equal to |
usequants |
A Boolean argument indicating how the cut-points in |
cont |
A Boolean argument indicating whether to assume all predictors are continuous. |
rm.const |
A Boolean argument indicating whether to remove constant predictors. |
power |
The power parameter of the polynomial splitting probability for the tree prior. Only used if
|
base |
The base parameter of the polynomial splitting probability for the tree prior if |
split.prob |
A string indicating what kind of splitting probability is used for the tree prior. If
|
k |
The number of prior standard deviations that E(Y|x) = f(x) is away from +/-.5. The response
( |
ntree |
The number of trees in the ensemble. |
ndpost |
The number of posterior samples returned. |
nskip |
The number of posterior samples burned in. |
keepevery |
Every |
printevery |
As the MCMC runs, a message is printed every |
verbose |
A Boolean argument indicating whether any messages are printed out. |
See Linero (2018) or Section 2.2.3 in Luo and Daniels (2021) for details.
If vip.selection=TRUE
, this function also does variable selection by selecting variables whose BART VIP exceeds
1/ncol{x.train}
.
If true.idx
is provided, the precision, recall and F1 scores are returned.
If plot=TRUE
, plots showing which predictors are selected are generated.
The function medianInclusion.vs()
returns two (or one if vip.selection=FALSE
) plots if plot=TRUE
and a list with the following components.
dart.pvip |
The vector of DART MPVIPs. |
dart.pvip.imp.names |
The vector of column names of the predictors with DART MPVIP at least 0.5. |
dart.pvip.imp.cols |
The vector of column indices of the predictors with DART MPVIP at least 0.5. |
dart.precision |
The precision score for the DART approach; only returned if |
dart.recall |
The recall score for the DART approach; only returned if |
dart.f1 |
The F1 score for the DART approach; only returned if |
bart.vip |
The vector of BART VIPs; only returned if |
bart.vip.imp.names |
The vector of column names of the predictors with BART VIP exceeding |
bart.vip.imp.cols |
The vector of column indicies of the predictors with BART VIP exceeding |
bart.precision |
The precision score for the BART approach; only returned if |
bart.recall |
The recall score for the BART approach; only returned if |
bart.f1 |
The F1 score for the BART approach; only returned if |
Chuji Luo: cjluo@ufl.edu and Michael J. Daniels: daniels@ufl.edu.
Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 266–298.
Linero, A. R. (2018). "Bayesian regression trees for high-dimensional prediction and variable selection." J. Amer. Statist. Assoc. 113 626–636.
Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.
Rockova V, Saha E (2019). “On theory for BART.” In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 2839–2848). PMLR.
permute.vs
, mc.backward.vs
and abc.vs
.
## simulate data (Scenario C.M.1. in Luo and Daniels (2021)) set.seed(123) data = mixone(100, 10, 1, FALSE) ## test medianInclusion.vs() function res = medianInclusion.vs(data$X, data$Y, probit=FALSE, vip.selection=TRUE, true.idx=c(1, 2, 6:8), plot=FALSE, ntree=10, ndpost=100, nskip=100, verbose=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.