Description Usage Arguments Details Value References Examples
Display the effect of
a single variable (pdpgbart
) or pair of variables (pd2pgbart
).
Note that if response y is a binary with P(Y=1 | x) = F(f(x)), F the standard normal cdf, then the plots are all on the f scale.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | pdpgbart(
x.train, y.train,
xind=1:ncol(x.train), levs=NULL, levquants=c(.05,(1:9)/10,0.95),
pl=TRUE, plquants=c(.05,.95),
...)
## S3 method for class 'pdpgbart'
plot(x,
xind = seq_len(length(x$fd)),
plquants = c(0.05, 0.95), cols = c('black', 'blue'),
...)
pd2pgbart(
x.train, y.train,
xind=1:2, levs=NULL, levquants=c(.05,(1:9)/10,.95),
pl=TRUE, plquants=c(.05,.95),
...)
## S3 method for class 'pd2pgbart'
plot(x,
plquants = c(0.05, 0.95), contour.color = 'white',
justmedian = TRUE,
...)
|
x.train |
Explanatory variables for training (in sample) data. |
y.train |
Dependent variable for training (in sample) data. |
xind |
Integer vector indicating which variables are to be plotted.
In |
levs |
Gives the values of a variable at which the plot is to be constructed.
Must be a list, where the ith component gives the values for the ith variable.
In |
levquants |
If |
pl |
For |
plquants |
In the plots, beliefs about f(x) are indicated by plotting the
posterior median and a lower and upper quantile.
|
... |
Additional arguments.
In |
x |
For |
cols |
Vector of two colors. The first color is for the median of f, while the second color is for the upper and lower quantiles. |
contour.color |
Color for contours plotted on top of the image. |
justmedian |
A logical where if |
We divide the predictor vector x into a subgroup of interest, x_s and the complement x_c = x - x_s. A prediction f(x) can then be written as f(x_s, x_c). To estimate the effect of x_s on the prediction, Friedman suggests the partial dependence function
f_s(x_s) = (1/n) ∑_{i=1}\^n f(x_s,x_{ic})
where x_{ic} is the ith observation of x_c in the data. Note that (x_s, x_{ic}) will generally not be one of the observed data points. Using pgbart it is straightforward to then estimate and even obtain uncertainty bounds for f_s(x_s). A draw of f*_s(x_s) from the induced pgbart posterior on f_s(x_s) is obtained by simply computing f*_s(x_s) as a byproduct of each MCMC draw f*. The median (or average) of these MCMC draws f*_s(x_s) then yields an estimate of f_s(x_s), and lower and upper quantiles can be used to obtain intervals for f_s(x_s).
In pdpgbart
x_s consists of a single variable in x and in
pd2pgbart
it is a pair of variables.
This is a computationally intensive procedure.
For example, in pdbart
, to compute the partial dependence plot
for 5 x_s values, we need
to compute f(x_s, x_c) for all possible (x_s, x_{ic}) and there
would be 5n of these where n is the sample size.
All of that computation would be done for each kept pgbart draw.
For this reason running pgbart with keepevery
larger than 1 (eg. 10)
makes the procedure much faster.
The plot methods produce the plots and don't return anything.
pdpgbart
and pd2pgbart
return lists with components
given below. The list returned by pdpgbart
is assigned class
pdpgbart
and the list returned by pd2pgbart
is assigned
class pd2pgbart
.
fd |
A matrix whose (i, j) value is the ith draw of f_s(x_s) for the jth value of x_s. “fd” is for “function draws”. For For |
levs |
The list of levels used, each component corresponding to a variable.
If argument |
xlbs |
A vector of character strings which are the plotting labels used for the variables. |
The remaining components returned in the list are the same as in the value of pgbart_train
.
They are simply passed on from the pgbart run used to create the partial dependence plot.
Lakshminarayanan B, Roy D, Teh Y W. (2015) Particle Gibbs for Bayesian Additive Regression Trees Artificial Intelligence and Statistics, 553-561.
Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298.
Friedman, J. H. (2001) Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29, 1189–1232.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | ## simulate data
f <- function(x) { return(0.5 * x[,1] + 2 * x[,2] * x[,3]) }
sigma <- 0.2
n <- 100
set.seed(27)
x <- matrix(2 * runif(n * 3) -1, ncol = 3)
colnames(x) <- c('rob', 'hugh', 'ed')
Ey <- f(x)
y <- rnorm(n, Ey, sigma)
## first two plot regions are for pdbart, third for pd2bart
par(mfrow = c(1, 3))
## pdbart: one dimensional partial dependence plot
set.seed(99)
pdb1 <-
pdpgbart(
x, y, xind=c(1,2),
levs=list(seq(-1,1,.2), seq(-1,1,.2)), pl=FALSE,
keepevery=10, ntree=5, nskip=100, ndpost=200
)
plot(pdb1,ylim=c(-.6,.6))
## pd2bart: two dimensional partial dependence plot
set.seed(99)
pdb2 <-
pd2pgbart(x, y, xind = c(2, 3),
levquants = c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95),
pl = FALSE, ntree = 5, keepevery = 10, verbose = FALSE
)
plot(pdb2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.