abart | R Documentation |
BART is a Bayesian “sum-of-trees” model.
For a numeric response y
, we have
y = f(x) + \epsilon
,
where \epsilon \sim N(0,\sigma^2)
.
f
is the sum of many tree models.
The goal is to have very flexible inference for the uknown
function f
.
In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.
abart(
x.train, times, delta,
x.test=matrix(0,0,0), K=100,
type='abart', ntype=1,
sparse=FALSE, theta=0, omega=1,
a=0.5, b=1, augment=FALSE, rho=NULL,
xinfo=matrix(0,0,0), usequants=FALSE,
rm.const=TRUE,
sigest=NA, sigdf=3, sigquant=0.90,
k=2, power=2, base=0.95,
lambda=NA, tau.num=c(NA, 3, 6)[ntype],
offset=NULL, w=rep(1, length(times)),
ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
ndpost=1000L, nskip=100L,
keepevery=c(1L, 10L, 10L)[ntype],
printevery=100L, transposed=FALSE,
mc.cores = 1L, ## mc.abart only
nice = 19L, ## mc.abart only
seed = 99L ## mc.abart only
)
mc.abart(
x.train, times, delta,
x.test=matrix(0,0,0), K=100,
type='abart', ntype=1,
sparse=FALSE, theta=0, omega=1,
a=0.5, b=1, augment=FALSE, rho=NULL,
xinfo=matrix(0,0,0), usequants=FALSE,
rm.const=TRUE,
sigest=NA, sigdf=3, sigquant=0.90,
k=2, power=2, base=0.95,
lambda=NA, tau.num=c(NA, 3, 6)[ntype],
offset=NULL, w=rep(1, length(times)),
ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
ndpost=1000L, nskip=100L,
keepevery=c(1L, 10L, 10L)[ntype],
printevery=100L, transposed=FALSE,
mc.cores = 2L, nice = 19L, seed = 99L
)
x.train |
Explanatory variables for training (in sample)
data. |
times |
The time of event or right-censoring. |
delta |
The event indicator: 1 is an event while 0 is censored. |
x.test |
Explanatory variables for test (out of sample)
data. Should have same structure as |
K |
If provided, then coarsen |
type |
You can use this argument to specify the type of fit.
|
ntype |
The integer equivalent of |
sparse |
Whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform; see Linero 2016. |
theta |
Set |
omega |
Set |
a |
Sparse parameter for |
b |
Sparse parameter for |
rho |
Sparse parameter: typically |
augment |
Whether data augmentation is to be performed in sparse variable selection. |
xinfo |
You can provide the cutpoints to BART or let BART
choose them for you. To provide them, use the |
usequants |
If |
rm.const |
Whether or not to remove constant variables. |
sigest |
The prior for the error variance
( |
sigdf |
Degrees of freedom for error variance prior.
Not used if |
sigquant |
The quantile of the prior that the rough estimate
(see |
k |
For numeric |
power |
Power parameter for tree prior. |
base |
Base parameter for tree prior. |
lambda |
The scale of the prior for the variance. Not used if |
tau.num |
The numerator in the |
offset |
Continous BART operates on |
w |
Vector of weights which multiply the standard deviation.
Not used if |
ntree |
The number of trees in the sum. |
numcut |
The number of possible values of |
ndpost |
The number of posterior draws returned. |
nskip |
Number of MCMC iterations to be treated as burn in. |
printevery |
As the MCMC runs, a message is printed every printevery draws. |
keepevery |
Every keepevery draw is kept to be returned to the user. |
transposed |
When running |
seed |
Setting the seed required for reproducible MCMC. |
mc.cores |
Number of cores to employ in parallel. |
nice |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |
BART is a Bayesian MCMC method.
At each MCMC interation, we produce a draw from the joint posterior
(f,\sigma) | (x,y)
in the numeric y
case
and just f
in the binary y
case.
Thus, unlike a lot of other modelling methods in R, we do not produce
a single model object from which fits and summaries may be extracted.
The output consists of values f^*(x)
(and
\sigma^*
in the numeric case) where * denotes a
particular draw. The x
is either a row from the training data,
x.train
or the test data, x.test
.
abart
returns an object of type abart
which is
essentially a list.
In the numeric y
case, the list has components:
yhat.train |
A matrix with ndpost rows and nrow(x.train) columns.
Each row corresponds to a draw |
yhat.test |
Same as yhat.train but now the x's are the rows of the test data. |
yhat.train.mean |
train data fits = mean of yhat.train columns. |
yhat.test.mean |
test data fits = mean of yhat.test columns. |
sigma |
post burn in draws of sigma, length = ndpost. |
first.sigma |
burn-in draws of sigma. |
varcount |
a matrix with ndpost rows and nrow(x.train) columns. Each row is for a draw. For each variable (corresponding to the columns), the total count of the number of times that variable is used in a tree decision rule (over all trees) is given. |
sigest |
The rough error standard deviation ( |
wbart
N = 1000
P = 5 #number of covariates
M = 8
set.seed(12)
x.train=matrix(runif(N*P, -2, 2), N, P)
mu = x.train[ , 1]^3
y=rnorm(N, mu)
offset=mean(y)
T=exp(y)
C=rexp(N, 0.05)
delta=(T<C)*1
table(delta)/N
times=(T*delta+C*(1-delta))
##test BART with token run to ensure installation works
set.seed(99)
post1 = abart(x.train, times, delta, nskip=5, ndpost=10)
## Not run:
post1 = mc.abart(x.train, times, delta,
mc.cores=M, seed=99)
post2 = mc.abart(x.train, times, delta, offset=offset,
mc.cores=M, seed=99)
Z=8
plot(mu, post1$yhat.train.mean, asp=1,
xlim=c(-Z, Z), ylim=c(-Z, Z))
abline(a=0, b=1)
plot(mu, post2$yhat.train.mean, asp=1,
xlim=c(-Z, Z), ylim=c(-Z, Z))
abline(a=0, b=1)
plot(post1$yhat.train.mean, post2$yhat.train.mean, asp=1,
xlim=c(-Z, Z), ylim=c(-Z, Z))
abline(a=0, b=1)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.