bark-deprecated | R Documentation |
BARK is a Bayesian sum-of-kernels model.
For numeric response y
, we have
y = f(x) + \epsilon
,
where \epsilon \sim N(0,\sigma^2)
.
For a binary response y
, P(Y=1 | x) = F(f(x))
,
where F
denotes the standard normal cdf (probit link).
In both cases, f
is the sum of many Gaussian kernel functions.
The goal is to have very flexible inference for the unknown
function f
.
BARK uses an approximation to a Cauchy process as the prior distribution
for the unknown function f
.
Feature selection can be achieved through the inference on the scale parameters in the Gaussian kernels. BARK accepts four different types of prior distributions, e, d, enabling either soft shrinkage or se, sd, enabling hard shrinkage for the scale parameters.
x.train |
Explanatory variables for training (in sample) data. |
y.train |
Dependent variable for training (in sample) data. |
x.test |
Explanatory variables for test (out of sample) data. |
type |
BARK type, e, d, se, or sd, default
choice is se. |
classification |
TRUE/FALSE logical variable, indicating a classification or regression problem. |
keepevery |
Every keepevery draw is kept to be returned to the user |
nburn |
Number of MCMC iterations (nburn*keepevery) to be treated as burn in. |
nkeep |
Number of MCMC iterations kept for the posterior inference. |
printevery |
As the MCMC runs, a message is printed every printevery draws. |
keeptrain |
Logical, whether to keep results for training samples. |
fixed |
A list of fixed hyperparameters, using the default values if not
specified. |
tune |
A list of tuning parameters, not expected to change. |
theta |
A list of the starting values for the parameter theta, use defaults if nothing is given. |
BARK is implemented using a Bayesian MCMC method. At each MCMC interaction, we produce a draw from the joint posterior distribution, i.e. a full configuration of regression coefficients, kernel locations and kernel parameters etc.
Thus, unlike a lot of other modelling methods in R,
we do not produce a single model object
from which fits and summaries may be extracted.
The output consists of values
f^*(x)
(and \sigma^*
in the numeric case)
where * denotes a particular draw.
The x
is either a row from the training data (x.train)
bark
returns a list, including:
fixed |
Fixed hyperparameters |
tune |
Tuning parameters used |
theta.last |
The last set of parameters from the posterior draw |
theta.nvec |
A matrix with nrow(x.train) |
theta.varphi |
A matrix with nrow(x.train)
|
theta.beta |
A matrix with nrow(x.train) |
theta.lambda |
A matrix with ncol(x.train) rows and (nkeep) columns, recording the kernel scale parameters |
thea.phi |
The vector of length nkeep, recording the precision in regression Gaussian noise (1 for the classification case) |
yhat.train |
A matrix with nrow(x.train) rows and (nkeep) columns.
Each column corresponds to a draw |
yhat.test |
Same as yhat.train but now the x's are the rows of the test data |
yhat.train.mean |
train data fits = row mean of yhat.train |
yhat.test.mean |
test data fits = row mean of yhat.test |
Ouyang, Zhi (2008) Bayesian Additive Regression Kernels. Duke University. PhD dissertation, page 58.
Other bark deprecated functions:
bark-package-deprecated
,
sim.Circle-deprecated
,
sim.Friedman1-deprecated
,
sim.Friedman2-deprecated
,
sim.Friedman3-deprecated
# Simulate regression example
# Friedman 2 data set, 200 noisy training, 1000 noise free testing
# Out of sample MSE in SVM (default RBF): 6500 (sd. 1600)
# Out of sample MSE in BART (default): 5300 (sd. 1000)
traindata <- sim_Friedman2(200, sd=125)
testdata <- sim_Friedman2(1000, sd=0)
# example with a very small number of iterations to illustrate the method
fit.bark.d <- bark_mat(traindata$x, traindata$y, testdata$x,
nburn=10, nkeep=10, keepevery=10,
classification=FALSE, type="d")
boxplot(data.frame(fit.bark.d$theta.lambda))
mean((fit.bark.d$yhat.test.mean-testdata$y)^2)
# Simulate classification example
# Circle 5 with 2 signals and three noisy dimensions
# Out of sample erorr rate in SVM (default RBF): 0.110 (sd. 0.02)
# Out of sample error rate in BART (default): 0.065 (sd. 0.02)
traindata <- sim_circle(200, dim=5)
testdata <- sim_circle(1000, dim=5)
fit.bark.se <- bark_mat(traindata$x, traindata$y, testdata$x, classification=TRUE, type="se")
boxplot(data.frame(fit.bark.se$theta.lambda))
mean((fit.bark.se$yhat.test.mean>0)!=testdata$y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.