Description Usage Arguments Details Value Author(s) References Examples

View source: R/partialDependence.R

Run `bart`

at test observations constructed so that
a plot can be created
displaying the effect of
a single variable (`pdbart`

) or pair of variables (`pd2bart`

).
Note that if *y* is a binary with *P(Y=1 | x) = F(f(x))*, *F* the standard
normal cdf, then the plots are all on the *f* scale.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
pdbart(x.train, y.train,
xind = seq_len(ncol(x.train)),
levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95),
pl = TRUE, plquants = c(0.05, 0.95),
...)
## S3 method for class 'pdbart'
plot(x,
xind = seq_len(length(x$fd)),
plquants = c(0.05, 0.95), cols = c('black', 'blue'),
...)
pd2bart(x.train, y.train,
xind = c(1, 2),
levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95),
pl = TRUE, plquants = c(0.05, 0.95),
...)
## S3 method for class 'pd2bart'
plot(x,
plquants = c(0.05, 0.95), contour.color = 'white',
justmedian = TRUE,
...)
``` |

`x.train` |
Explanatory variables for training (in sample) data. Must be a matrix of numeric type with rows corresponding to observations and columns to variables. Categorical variables/factors need to be converted to dummies, with a full set of columns present if there are more than two levels. |

`y.train` |
Dependent variable for training (in sample) data. Must be a numeric vector with length
equal to the number of rows in |

`xind` |
Integer vector indicating which variables are to be plotted.
In |

`levs` |
Gives the values of a variable at which the plot is to be constructed.
Must be a list, where the |

`levquants` |
If |

`pl` |
For |

`plquants` |
In the plots, beliefs about |

`...` |
Additional arguments.
In |

`x` |
For |

`cols` |
Vector of two colors. The first color is for the median of |

`contour.color` |
Color for contours plotted on top of the image. |

`justmedian` |
A logical where if |

We divide the predictor vector *x* into a subgroup of interest,
*x_s* and the complement *x_c = x - x_s*.
A prediction *f(x)* can
then be written as *f(x_s, x_c)*. To estimate the effect of *x_s*
on the prediction, Friedman suggests the partial dependence
function

*
f_s(x_s) = (1/n) ∑_{i=1}\^n f(x_s,x_{ic})
*

where *x_{ic}* is the *i*th observation of *x_c* in the data. Note
that *(x_s, x_{ic})* will generally not be one of the observed data
points. Using BART it is straightforward to then estimate and even
obtain uncertainty bounds for *f_s(x_s)*. A draw of *f*_s(x_s)*
from the induced BART posterior on *f_s(x_s)* is obtained by
simply computing *f*_s(x_s)* as a byproduct of each MCMC draw
*f**. The median (or average)
of these MCMC draws *f*_s(x_s)* then yields an
estimate of *f_s(x_s)*, and lower and upper quantiles can be used
to obtain intervals for *f_s(x_s)*.

In `pdbart`

*x_s* consists of a single variable in *x* and in
`pd2bart`

it is a pair of variables.

This is a computationally intensive procedure.
For example, in `pdbart`

, to compute the partial dependence plot
for 5 *x_s* values, we need
to compute *f(x_s, x_c)* for all possible *(x_s, x_{ic})* and there
would be *5n* of these where *n* is the sample size.
All of that computation would be done for each kept BART draw.
For this reason running BART with `keepevery`

larger than 1 (eg. 10)
makes the procedure much faster.

The plot methods produce the plots and don't return anything.

`pdbart`

and `pd2bart`

return lists with components
given below. The list returned by `pdbart`

is assigned class
`pdbart`

and the list returned by `pd2bart`

is assigned
class `pd2bart`

.

`fd` |
A matrix whose For For |

`levs` |
The list of levels used, each component corresponding to a variable.
If argument |

`xlbs` |
A vector of character strings which are the plotting labels used for the variables. |

The remaining components returned in the list are the same as in the value of `bart`

.
They are simply passed on from the BART run used to create the partial dependence plot.
The function `plot.bart`

can be applied to the object returned by `pdbart`

or
`pd2bart`

to examine the BART run.

Hugh Chipman: [email protected].

Robert McCulloch: [email protected].

Chipman, H., George, E., and McCulloch, R. (2006) BART: Bayesian Additive Regression Trees.

Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning.

both of the above at: http://www.rob-mcculloch.org/

Friedman, J.H. (2001)
Greedy function approximation: A gradient boosting machine.
*The Annals of Statistics*, **29**, 1189–1232.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | ```
## Not run:
## simulate data
f <- function(x) { return(0.5 * x[,1] + 2 * x[,2] * x[,3]) }
sigma <- 0.2
n <- 100
set.seed(27)
x <- matrix(2 * runif(n * 3) -1, ncol = 3);
colnames(x) <- c('rob', 'hugh', 'ed')
Ey <- f(x)
y <- rnorm(n, Ey, sigma)
## first two plot regions are for pdbart, third for pd2bart
par(mfrow = c(1, 3))
## pdbart: one dimensional partial dependence plot
set.seed(99)
pdb1 <-
pdbart(x, y, xind = c(1, 2),
levs = list(seq(-1, 1, 0.2), seq(-1, 1, 0.2)),
pl = FALSE, keepevery = 10, ntree = 100)
plot(pdb1, ylim = c(-0.6,.6))
## pd2bart: two dimensional partial dependence plot
set.seed(99)
pdb2 <-
pd2bart(x, y, xind = c(2, 3),
levquants = c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95),
pl = FALSE, ntree = 100, keepevery = 10, verbose = FALSE)
plot(pdb2)
## compare BART fit to linear model and truth = Ey
lmFit <- lm(y ~., data.frame(x, y))
fitmat <- cbind(y, Ey, lmFit$fitted, pdb1$yhat.train.mean)
colnames(fitmat) <- c('y', 'Ey', 'lm', 'bart')
print(cor(fitmat))
## End(Not run)
``` |

vdorie/dbarts documentation built on Dec. 7, 2018, 7:53 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.