xbart | R Documentation |

Fits the BART model against varying `k`

, `power`

, `base`

, and `ntree`

parameters using *K*-fold or repeated random subsampling crossvalidation, sharing burn-in between parameter settings. Results are given an array of evalulations of a loss functions on the held-out sets.

xbart( formula, data, subset, weights, offset, verbose = FALSE, n.samples = 200L, method = c("k-fold", "random subsample"), n.test = c(5, 0.2), n.reps = 40L, n.burn = c(200L, 150L, 50L), loss = c("rmse", "log", "mcr"), n.threads = dbarts::guessNumCores(), n.trees = 75L, k = NULL, power = 2, base = 0.95, drop = TRUE, resid.prior = chisq, control = dbarts::dbartsControl(), sigma = NA_real_, seed = NA_integer_)

`formula` |
An object of class |

`data` |
An optional data frame, list, or environment containing predictors to be used with the model. For backwards compatibility, can also be the |

`subset` |
An optional vector specifying a subset of observations to be used in the fitting process. |

`weights` |
An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations |

`offset` |
An optional vector specifying an offset from 0 for the relationship between the underyling function, |

`verbose` |
A logical determining if additional output is printed to the console. |

`n.samples` |
A positive integer, setting the number of posterior samples drawn for each fit of training data and used by the loss function. |

`method` |
Character string, either |

`n.test` |
For each fit, the test sample size or proportion. For method |

`n.reps` |
A positive integer setting the number of cross validation steps that will be taken. For |

`n.burn` |
Between one and three positive integers, specifying the 1) initial burn-in, 2) burn-in when moving from one parameter setting to another, and 3) the burn-in between each random subsample replication. The third parameter is also the burn in when moving between folds in |

`loss` |
Either a one of the pre-set loss functions as character-strings ( |

`n.threads` |
Across different sets of parameters ( |

`n.trees` |
A vector of positive integers setting the BART hyperparameter for the number of trees in the sum-of-trees formulation. See |

`k` |
A vector of positive real numbers, setting the BART hyperparameter for the node-mean prior standard deviation. If |

`power` |
A vector of real numbers greater than one, setting the BART hyperparameter for the tree prior's growth probability, given by |

`base` |
A vector of real numbers in |

`drop` |
Logical, determining if dimensions with a single value are dropped from the result. |

`resid.prior` |
An expression of the form |

`control` |
An object inheriting from |

`sigma` |
A positive numeric estimate of the residual standard deviation. If |

`seed` |
Optional integer specifying the desired pRNG seed. It should not be needed when running single-threaded - |

Crossvalidates `n.reps`

replications against the crossproduct of given hyperparameter vectors `n.trees`

*** `k`

*** `power`

*** `base`

. For each fit, either one fold is withheld as test data and `n.test - 1`

folds are used as training data or `n * n.test`

observations are withheld as test data and `n * (1 - n.test)`

used as training. A replication corresponds to fitting all *K* folds in `"k-fold"`

crossvalidation or a single fit with `"random subsample"`

. The training data is used to fit a model and make predictions on the test data which are used together with the test data itself to evaluate the `loss`

function.

`loss`

functions are either the default of average negative log-loss for binary outcomes and root-mean-squared error for continuous outcomes, missclassification rates for binary outcomes, or a `function`

with arguments `y.test`

and `y.test.hat`

. `y.test.hat`

is of dimensions equal to `length(y.test)`

*** `n.samples`

. A third option is to pass a list of `list(function, evaluationEnvironment)`

, so as to provide default bindings. RMSE is a monotonic transformation of the average log-loss for continuous outcomes, so specifying log-loss in that case calculates RMSE instead.

An array of dimensions `n.reps`

*** `length(n.trees)`

*** `length(k)`

*** `length(power)`

*** `length(base)`

. If `drop`

is `TRUE`

, dimensions of length 1 are omitted. If all hyperparameters are of length 1, then the result will be a vector of length `n.reps`

. When the result is an array, the `dimnames`

of the result shall be set to the corresponding hyperparameters.

For method `"k-fold"`

, each element is an average across the *K* fits. For `"random subsample"`

, each element represents a single fit.

Vincent Dorie: vdorie@gmail.com

`bart`

, `dbarts`

f <- function(x) { 10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 + 10 * x[,4] + 5 * x[,5] } set.seed(99) sigma <- 1.0 n <- 100 x <- matrix(runif(n * 10), n, 10) Ey <- f(x) y <- rnorm(n, Ey, sigma) mad <- function(y.train, y.train.hat, weights) { # note, weights are ignored mean(abs(y.train - apply(y.train.hat, 1L, mean))) } ## low iteration numbers to to run quickly xval <- xbart(x, y, n.samples = 15L, n.reps = 4L, n.burn = c(10L, 3L, 1L), n.trees = c(5L, 7L), k = c(1, 2, 4), power = c(1.5, 2), base = c(0.75, 0.8, 0.95), n.threads = 1L, loss = mad)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.