dbarts: Discrete Bayesian Additive Regression Trees Sampler

View source: R/dbarts.R

dbartsR Documentation

Discrete Bayesian Additive Regression Trees Sampler

Description

Creates a sampler object for a given problem which fits a Bayesian Additive Regreesion Trees model. Internally stores state in such a way as to be mutable.

Usage

dbarts(
    formula, data, test, subset, weights, offset, offset.test = offset,
    verbose = FALSE, n.samples = 800L,
    tree.prior = cgm, node.prior = normal, resid.prior = chisq,
    proposal.probs = c(
        birth_death = 0.5, swap = 0.1, change = 0.4, birth = 0.5),
    control = dbarts::dbartsControl(), sigma = NA_real_)

Arguments

formula

An object of class formula following an analogous model description syntax as lm. For backwards compatibility, can also be the bart matrix x.train.

data

An optional data frame, list, or environment containing predictors to be used with the model. For backwards compatibility, can also be the bart vector y.train.

test

An optional matrix or data frame with the same number of predictors as data, or formula in backwards compatibility mode. If column names are present, a matching algorithm is used.

subset

An optional vector specifying a subset of observations to be used in the fitting process.

weights

An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations y \mid x \sim N(f(x), \sigma^2 / w), where f(x) is the unknown function.

offset

An optional vector specifying an offset from 0 for the relationship between the underyling function, f(x), and the response y. Only is useful for binary responses, in which case the model fit is to assume P(Y = 1 \mid X = x) = \Phi(f(x) + \mathrm{offset}), where \Phi is the standard normal cumulative distribution function.

offset.test

The equivalent of offset for test observations. Will attempt to use offset when applicable.

verbose

A logical determining if additional output is printed to the console. See dbartsControl.

n.samples

A positive integer setting the default number of posterior samples to be returned for each run of the sampler. Can be overriden at run-time. See dbartsControl.

tree.prior

An expression of the form cgm or cgm(power, base, split.probs) setting the tree prior used in fitting.

node.prior

An expression of the form normal or normal(k) that sets the prior used on the averages within nodes.

resid.prior

An expression of the form chisq or chisq(df, quant) that sets the prior used on the residual/error variance.

proposal.probs

Named numeric vector or NULL, optionally specifying the proposal rules and their probabilities. Elements should be "birth_death", "change", and "swap" to control tree change proposals, and "birth" to give the relative frequency of birth/death in the "birth_death" step.

control

An object inheriting from dbartsControl, created by the dbartsControl function.

sigma

A positive numeric estimate of the residual standard deviation. If NA, a linear model is used with all of the predictors to obtain one.

Details

“Discrete sampler” refers to that dbarts is implemented using ReferenceClasses, so that there exists a mutable object constructed in C++ that is largely obscured from R. The dbarts function is the primary way of creating a dbartsSampler, for which a variety of methods exist.

Value

A reference object of dbartsSampler.


dbarts documentation built on May 29, 2024, 3:31 a.m.