Description Usage Arguments Details Value Author(s) References Examples
Run bart
at test observations constructed so that
a plot can be created
displaying the effect of
a single variable (pdbart
) or pair of variables (pd2bart
).
Note the y is a binary with P(Y=1 | x) =F(f(x)) with F the standard
normal cdf, then the plots are all on the f scale.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | pdbart(
x.train, y.train,
xind=1:ncol(x.train), levs=NULL, levquants=c(.05,(1:9)/10,.95),
pl=TRUE, plquants=c(.05,.95), ...)
## S3 method for class 'pdbart'
plot(
x,
xind = 1:length(x$fd),
plquants =c(.05,.95),cols=c('black','blue'), ...)
pd2bart(
x.train, y.train,
xind=1:2, levs=NULL, levquants=c(.05,(1:9)/10,.95),
pl=TRUE, plquants=c(.05,.95), ...)
## S3 method for class 'pd2bart'
plot(
x,
plquants =c(.05,.95), contour.color='white',
justmedian=TRUE, ...)
|
x.train |
Explanatory variables for training (in sample) data. |
y.train |
Dependent variable for training (in sample) data. |
xind |
Integer vector indicating which variables are to be plotted. |
levs |
Gives the values of a variable at which the plot is to be constructed. |
levquants |
If levs in NULL, the values of each variable used in the plot is
set to the quantiles (in x.train) indicated by levquants. |
pl |
For |
plquants |
In the plots, beliefs about f(x) are indicated by plotting the posterior median and a lower and upper quantile. plquants is a double vector of length two giving the lower and upper quantiles. |
... |
Additional arguments. |
x |
For plot.*, object returned from pdbart or pd2bart. |
cols |
Vector of two colors. |
contour.color |
Color for contours plotted on top of the image. |
justmedian |
Boolean, if true just one plot is created for the median of f(x) draws. If false, three plots are created one for the median and two additional ones for the lower and upper quantiles. In this case, mfrow is set to c(1,3). |
We divide the predictor vector x into a subgroup of interest, x_s and the complement x_c = x - x_s. A prediction f(x) can then be written as f(x_s,x_c). To estimate the effect of x_s on the prediction, Friedman suggests the partial dependence function
f_s(x_s) = (1/n) ∑_{i=1}\^n f(x_s,x_{ic})
where x_{ic} is the i\^th observation of x_c in the data. Note that (x_s,x_{ic}) will generally not be one of the observed data points. Using BART it is straightforward to then estimate and even obtain uncertainty bounds for f_s(x_s). A draw of f*_s(x_s) from the induced BART posterior on f_s(x_s) is obtained by simply computing f*_s(x_s) as a byproduct of each MCMC draw f*. The median (or average) of these MCMC draws f*_s(x_s) then yields an estimate of f_s(x_s), and lower and upper quantiles can be used to obtain intervals for f_s(x_s).
In pdbart
x_s consists of a single variable in x and in
pd2bart
it is a pair of variables.
This is a computationally intensive procedure.
For example, in pdbart
, to compute the partial dependence plot
for 5 x_s values, we need
to compute f(x_s,x_c) for all possible (x_s,x_{ic}) and there
would be 5n of these where n is the sample size.
All of that computation would be done for each kept BART draw.
For this reason running BART with keepevery larger than 1 (eg. 10)
makes the procedure much faster.
The plot methods produce the plots and don't return anything.
pdbart
and pd2bart
return lists with components
given below. The list returned by pdbart
is assigned class
‘pdbart’ and the list returned by pd2bart
is assigned
class ‘pd2bart’.
fd |
A matrix whose (i,j) value is the i\^th draw of f_s(x_s) for the j\^th value of x_s. “fd” is for “function draws”. For For |
levs |
The list of levels used, each component corresponding to a variable. |
xlbs |
vector of character strings which are the plotting labels used for the variables. |
The remaining components returned in the list are the same as in the value of bart
.
They are simply passed on from the BART run used to create the partial dependence plot.
The function plot.bart
can be applied to the object returned by pdbart
or
pd2bart
to examine the BART run.
Hugh Chipman: hugh.chipman@gmail.com.
Robert McCulloch: robert.e.mcculloch@gmail.com.
Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ##simulate data
f = function(x) { return(.5*x[,1] + 2*x[,2]*x[,3]) }
sigma=.2 # y = f(x) + sigma*z
n=100 #number of observations
set.seed(27)
x = matrix(2*runif(n*3)-1,ncol=3) ; colnames(x) = c('rob','hugh','ed')
Ey = f(x)
y = Ey + sigma*rnorm(n)
lmFit = lm(y~.,data.frame(x,y)) #compare lm fit to BART later
par(mfrow=c(1,3)) #first two for pdbart, third for pd2bart
##pdbart: one dimensional partial dependence plot
set.seed(99)
pdb1 = pdbart(x,y,xind=c(1,2),
levs=list(seq(-1,1,.2),seq(-1,1,.2)),pl=FALSE,
keepevery=10,ntree=100,nskip=100,ndpost=200) #should run longer!
plot(pdb1,ylim=c(-.6,.6))
##pd2bart: two dimensional partial dependence plot
set.seed(99)
pdb2 = pd2bart(x,y,xind=c(2,3),
levquants=c(.05,.1,.25,.5,.75,.9,.95),pl=FALSE,
ntree=100,keepevery=10,verbose=FALSE,nskip=100,ndpost=200) #should run longer!
plot(pdb2)
##compare BART fit to linear model and truth = Ey
fitmat = cbind(y,Ey,lmFit$fitted,pdb1$yhat.train.mean)
colnames(fitmat) = c('y','Ey','lm','bart')
print(cor(fitmat))
## plot.bart(pdb1) displays the BART run used to get the plot.
|
Running BART with numeric y
number of trees: 100
Prior:
k: 2.000000
degrees of freedom in sigma prior: 3
quantile in sigma prior: 0.900000
power and base for tree prior: 2.000000 0.950000
use quantiles for rule cut points: 0
data:
number of training observations: 100
number of test observations: 2200
number of explanatory variables: 3
Cutoff rules c in x<=c vs x>c
Number of cutoffs: (var: number of possible c):
(1: 100) (2: 100) (3: 100)
Running mcmc loop:
iteration: 100 (of 300)
iteration: 200 (of 300)
iteration: 300 (of 300)
time for loop: 1
Tree sizes, last iteration:
2 2 2 5 2 2 3 2 3 2 3 3 2 2 3 2 2 5 6 2
3 2 3 3 3 2 2 4 1 4 3 2 5 2 3 2 2 4 3 2
4 2 6 3 3 2 3 3 2 2 1 4 3 2 4 2 3 5 2 3
3 3 2 3 3 3 3 2 2 3 2 3 5 2 3 2 3 2 3 3
3 1 3 2 4 2 3 2 4 3 2 2 2 3 2 4 3 2 2 3
Variable Usage, last iteration (var:count):
(1: 41) (2: 73) (3: 62)
DONE BART 11-2-2014
y Ey lm bart
y 1.0000000 0.9603886 0.4052732 0.9903473
Ey 0.9603886 1.0000000 0.4457354 0.9795244
lm 0.4052732 0.4457354 1.0000000 0.4255841
bart 0.9903473 0.9795244 0.4255841 1.0000000
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.