pgbart_predict: Make Predictions Using Bayesian Additive Regression Trees

Description Usage Arguments Details Value References See Also Examples

Description

Make predictions for a new test data set after building a model using trainng data by function pgbart_train.

Usage

1
pgbart_predict(x.test, model)

Arguments

x.test

Explanatory variables for test (out of sample) data.
Should have same structure as x.train in pgbart_train.

model

The path to save the model file as specified in pgbart_train.

Details

PGBART is an Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f,sigma) \| (x,y) in the numeric y case and just f in the binary y case.

Thus, unlike a lot of other modelling methods in R, we do not produce a single model object from which fits and summaries may be extracted. The output consists of values f*(x) (and sigma* in the numeric case) where * denotes a particular draw. The x is a row from the test data (x.test).

Value

pgbart_predict returns a list assigned class ‘pgbart’. In the numeric y case, the list has components:

yhat.test

A matrix with (ndpost/keepevery) rows and nrow(x.test) columns. Each row corresponds to a draw f* from the posterior of f and each column corresponds to a row of x.test. The (i,j) value is f*(x) for the i\^th kept draw of f and the j\^th row of x.test. Burn-in is dropped.

yhat.test.mean

Test data fits = mean of yhat.test columns. Only exists when y is not binary.

In the binary y case, the returned list has the components yhat.test and binaryOffset.

Note that in the binary y case, yhat.test is f(x) + binaryOffset. If you want draws of the probability P(Y=1 | x) you need to apply the normal cdf (pnorm) to these values.

References

Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298.

Lakshminarayanan B, Roy D, Teh Y W. (2015) Particle Gibbs for Bayesian Additive Regression Trees Artificial Intelligence and Statistics, 553-561.

Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning. Advances in Neural Information Processing Systems 19, Scholkopf, Platt and Hoffman, Eds., MIT Press, Cambridge, MA, 265-272.

Friedman, J.H. (1991) Multivariate Adaptive Regression Splines. The Annals of Statistics, 19, 1–67.

Breiman, L. (1996) Bias, Variance, and Arcing Classifiers. Tech. Rep. 460, Statistics Department, University of California, Berkeley, CA, USA.

See Also

pgbart_train, pdpgbart

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
##Example 1: simulated continuous outcome data (example from section 4.3 of Friedman's MARS paper)
f = function(x){
    10*sin(pi*x[,1]*x[,2]) + 20*(x[,3]-.5)^2+10*x[,4]+5*x[,5]
}
sigma = 1.0  #y = f(x) + sigma*z , z~N(0,1)
n = 100      #number of observations
set.seed(99)
x = matrix(runif(n*10), n, 10) 
Ey = f(x)
y = Ey+sigma*rnorm(n)
model_path = file.path(tempdir(),'pgbart.model')
pgbartFit = pgbart_train(x[1:(n*.75),], y[1:(n*.75)], 
                         model=model_path,
                         ndpost=200, ntree=5, usepg=TRUE)

pgbartPredict = pgbart_predict(x[(n*.75+1):n,], model=model_path)

cor(pgbartPredict$yhat.test.mean, y[(n*.75+1):n])

##Example 2: simulated binary outcome data (two normal example from Breiman) 
f <- function (n, d = 20) 
{
  x <- matrix(0, nrow = n, ncol = d)
  c1 <- sample.int(n, n/2)
  c2 <- (1:n)[-c1]
  a <- 2/sqrt(d)
  x[c1, ] <- matrix(rnorm(n = d * length(c1), mean = -a), ncol = d)
  x[c2, ] <- matrix(rnorm(n = d * length(c2), mean = a), ncol = d)
  
  x.train <- x
  y.train <- rep(0, n)
  y.train[c2] <- 1
  list(x.train=x.train, y.train=as.factor(y.train))
}


set.seed(99)
n <- 200
train <- f(n)
model_path = file.path(tempdir(),'pgbart.model')
pgbartFit = pgbart_train(train$x.train[1:(n*.75),], train$y.train[1:(n*.75)], 
                         model=model_path, ndpost=200, ntree=5, usepg=TRUE)
pgbartPredict = pgbart_predict(train$x.train[(n*.75+1):n,], model=model_path)
class.pred = ifelse(colMeans(apply(pgbartPredict$yhat.test, 2, pnorm)) <= 0.5, 0, 1)
table(class.pred, train$y.train[(n*.75+1):n])

pgbart documentation built on May 2, 2019, 8:42 a.m.