predict.rbart: Drawing Posterior Predictive Realizations for rbart models.
In rbart: Bayesian Trees for Conditional Mean and Variance

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/predict.rbart.R

The function predict.rbart() is the main function for drawing posterior predictive realizations at new inputs using a fitted model stored in a rbart object returned from rbart().

## S3 method for class 'rbart'
predict(
object,
x.test=object$x.train,
tc=1,
fmean=mean(object$y.train),
q.lower=0.025,
q.upper=0.975,...)

`object`	Object of type `rbart` from a previous call to `rbart()`
`x.test`	New input settings in the form of an `npred x p` matrix at which to construct predictions. Defaults to the training inputs.
`tc`	Number of OpenMP threads to use for parallel computing.
`fmean`	Mean-centering vector for the training data. Defaults to the value used by `rbart()` when fitting the model. Usually should be left to the default.
`q.lower`	Lower quantile to return.
`q.upper`	Upper quantile to return.
`...`	not used.

predict.rbart() is the main function for calculating posterior predictions and uncertainties once a model has been fit by rbart().

Returns an object of type rbart with the following entries.

`mdraws`	Posterior realizations of the mean function, f(x) stored in an `ndpost x npred` matrix, where ndpost is the number of kept MCMC draws in the `rbart` run.
`sdraws`	Posterior realizations of the standard deviation function, s(x) stored in an `ndpost x npred` matrix, where ndpost is the number of kept MCMC draws in the `rbart` run.
`mmean`	Posterior predictive mean of f(x).
`smean`	Posterior predictive mean of the standard deviation, s(x).
`msd`	Posterior standard deviation of the mean, f(x).
`ssd`	Posterior standard deviation of the standard devation, s(x).
`m.5`	Posterior median of the mean function realizations, f(x).
`m.lower`	Posterior `q.lower` quantile of the mean function realizations.
`m.upper`	Posterior `q.upper` quantile of the mean function realizations.
`s.5`	Posterior median of the standard deviation function realizations, s(x).
`s.lower`	Posterior `q.lower` quantile of the standard deviation function realizations.
`s.upper`	Posterior `q.upper` quantile of the standard deviation function realizations.
`q.lower`	Lower quantile used in constructing the above.
`q.upper`	Upper quantile used in constructing the above.

Matthew T. Pratola <mpratola@stat.osu.edu> [aut, cre, cph], Robert E. McCulloch <robert.e.mcculloch@gmail.com> [aut, cre, cph], Hugh Chipman <hugh.chipman@gmail.com> [aut, cph] Maintainer: Matthew T. Pratola <mpratola@stat.osu.edu>, Robert E. McCulloch <robert.e.mcculloch@gmail.com>

Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (1998) Bayesian CART model search. Journal of the American Statistical Association, 93, 935–948.

Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4, 266–298.

Pratola, Matthew T. (2016) Efficient Metropolis Hastings proposal mechanisms for Bayesian regression tree models. Bayesian analysis, 11, 885–911.

Pratola, Matthew T., Chipman, Hugh A., George, Edward I. and McCulloch, Robert E. (2017) Heteroscedastic BART Using Multiplicative Regression Trees. arXiv preprint, arXiv:1709.07542, 1–20.

rbart

##################################################
## This is just a stub (runs fast) example for testing.
##  For more realistic examples, please see:
##   (i) the vignette at www.rob-mcculloch.org
##   (ii) the example simulated data (see ?simdat)
##        and the longer run in ?rbartonsimd, 
##        where a saved run of rbart is run on simdat is plotted. 
##################################################

##simulate data
set.seed(99)

# train data
n=500 #train data sample size
p=1 #just one x
x = matrix(sort(runif(n*p)),ncol=p) #iid uniform x values
fx = 4*(x[,1]^2) #quadratric function f
sx = .2*exp(2*x[,1]) # exponential function s
y = fx + sx*rnorm(n) # y = f(x) + s(x) Z

#test data (the p added to the variable names is for predict)
np=500 #test data sample size
xp = matrix(sort(runif(np*p)),ncol=p)
fxp = 4*(xp[,1]^2)
sxp = .2*exp(2*xp[,1])
yp = fxp + sxp*rnorm(np)

##run rbart MCMC
# The number of interations is kept small to make example run,
##!!!!  REAL APPLICATIONS MAY NEED LONGER RUNS !!!!
#   nskip: burn in draws,
#   ndpost:kept draws,
#   nadapt: initial draws to tune MCMC,
#   numcut: number of cutpoints used for each x
#   k: bigger k gives smoother f (default is 2)
set.seed(19)
res = rbart(x,y,nskip=10,ndpost=20,nadapt=0,numcut=1000,k=5) #again, this is way too short a run!!!
## now predict to get inference
resp = predict(res,x.test=xp)

##check out of sample fit
cat("out of sample cor(f,fhat) is ",cor(fxp,resp$mmean),"\n")
cat("out of sample cor(s,shat) is ",cor(sxp,resp$smean),"\n")

##plot estimated vs. true
##plot the data
plot(xp,yp,cex.axis=1.5,cex.lab=1.5)
lines(xp,fxp,col="blue")
lines(xp,fx+2*sxp,col="blue",lty=2)
lines(xp,fxp-2*sxp,col="blue",lty=2)

## add the fit
lines(xp,resp$mmean) #estimate of f
lines(xp,resp$mmean+2*resp$smean) #estimate of sd
lines(xp,resp$mmean-2*resp$smean) #estimate of sd