pdbart | R Documentation |

Run `bart`

at test observations constructed so that a plot can be created displaying the effect of a single variable (`pdbart`

) or pair of variables (`pd2bart`

). Note that if *y* is a binary with *P(Y=1 | x) = F(f(x))*, *F* the standard normal cdf, then the plots are all on the *f* scale.

pdbart( x.train, y.train, xind = NULL, levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95), pl = TRUE, plquants = c(0.05, 0.95), ...) ## S3 method for class 'pdbart' plot( x, xind = seq_len(length(x$fd)), plquants = c(0.05, 0.95), cols = c('black', 'blue'), ...) pd2bart( x.train, y.train, xind = NULL, levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95), pl = TRUE, plquants = c(0.05, 0.95), ...) ## S3 method for class 'pd2bart' plot( x, plquants = c(0.05, 0.95), contour.color = 'white', justmedian = TRUE, ...)

`x.train` |
Explanatory variables for training (in sample) data. Can be any valid input to |

`y.train` |
Dependent variable for training (in sample) data. Can be a numeric vector or, when passing |

`xind` |
Integer, character vector, or the right-hand side of a formula indicating which variables are to be plotted. In |

`levs` |
Gives the values of a variable at which the plot is to be constructed. Must be a list, where the |

`levquants` |
If |

`pl` |
For |

`plquants` |
In the plots, beliefs about |

`...` |
Additional arguments. In |

`x` |
For |

`cols` |
Vector of two colors. The first color is for the median of |

`contour.color` |
Color for contours plotted on top of the image. |

`justmedian` |
A logical where if |

We divide the predictor vector *x* into a subgroup of interest, *x_s* and the complement *x_c = x - x_s*. A prediction *f(x)* can then be written as *f(x_s, x_c)*. To estimate the effect of *x_s* on the prediction, Friedman suggests the partial dependence function

*f_s(x_s) = (1/n) ∑_{i=1}\^n f(x_s,x_{ic})*

where *x_{ic}* is the *i*th observation of *x_c* in the data. Note that *(x_s, x_{ic})* will generally not be one of the observed data points. Using BART it is straightforward to then estimate and even obtain uncertainty bounds for *f_s(x_s)*. A draw of *f*_s(x_s)* from the induced BART posterior on *f_s(x_s)* is obtained by simply computing *f*_s(x_s)* as a byproduct of each MCMC draw *f**. The median (or average) of these MCMC draws *f*_s(x_s)* then yields an estimate of *f_s(x_s)*, and lower and upper quantiles can be used to obtain intervals for *f_s(x_s)*.

In `pdbart`

*x_s* consists of a single variable in *x* and in `pd2bart`

it is a pair of variables.

This is a computationally intensive procedure. For example, in `pdbart`

, to compute the partial dependence plot for 5 *x_s* values, we need to compute *f(x_s, x_c)* for all possible *(x_s, x_{ic})* and there would be *5n* of these where *n* is the sample size. All of that computation would be done for each kept BART draw. For this reason running BART with `keepevery`

larger than 1 (eg. 10) makes the procedure much faster.

The plot methods produce the plots and don't return anything.

`pdbart`

and `pd2bart`

return lists with components given below. The list returned by `pdbart`

is assigned class `pdbart`

and the list returned by `pd2bart`

is assigned class `pd2bart`

.

`fd` |
A matrix whose For For |

`levs` |
The list of levels used, each component corresponding to a variable. If argument |

`xlbs` |
A vector of character strings which are the plotting labels used for the variables. |

The remaining components returned in the list are the same as in the value of `bart`

. They are simply passed on from the BART run used to create the partial dependence plot. The function `plot.bart`

can be applied to the object returned by `pdbart`

or `pd2bart`

to examine the BART run.

Hugh Chipman: hugh.chipman@acadiau.ca.

Robert McCulloch: robert.mcculloch@chicagogsb.edu.

Chipman, H., George, E., and McCulloch, R. (2006) BART: Bayesian Additive Regression Trees.

Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning.

both of the above at: https://www.rob-mcculloch.org/

Friedman, J.H. (2001)
Greedy function approximation: A gradient boosting machine.
*The Annals of Statistics*, **29**, 1189–1232.

## Not run: ## simulate data f <- function(x) return(0.5 * x[,1] + 2 * x[,2] * x[,3]) sigma <- 0.2 n <- 100 set.seed(27) x <- matrix(2 * runif(n * 3) - 1, ncol = 3) colnames(x) <- c('rob', 'hugh', 'ed') Ey <- f(x) y <- rnorm(n, Ey, sigma) ## first two plot regions are for pdbart, third for pd2bart par(mfrow = c(1, 3)) ## pdbart: one dimensional partial dependence plot set.seed(99) pdb1 <- pdbart( x, y, xind = c(1, 2), levs = list(seq(-1, 1, 0.2), seq(-1, 1, 0.2)), pl = FALSE, keepevery = 10, ntree = 100 ) plot(pdb1, ylim = c(-0.6, 0.6)) ## pd2bart: two dimensional partial dependence plot set.seed(99) pdb2 <- pd2bart( x, y, xind = c(2, 3), levquants = c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95), pl = FALSE, ntree = 100, keepevery = 10, verbose = FALSE) plot(pdb2) ## compare BART fit to linear model and truth = Ey lmFit <- lm(y ~ ., data.frame(x, y)) fitmat <- cbind(y, Ey, lmFit$fitted, pdb1$yhat.train.mean) colnames(fitmat) <- c('y', 'Ey', 'lm', 'bart') print(cor(fitmat)) ## example showing the use of a pre-fitted model df <- data.frame(y, x) set.seed(99) bartFit <- bart( y ~ rob + hugh + ed, df, keepevery = 10, ntree = 100, keeptrees = TRUE) pdb1 <- pdbart(bartFit, xind = rob + ed, pl = FALSE) ## End(Not run)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.