| ForestKernelComputation | R Documentation |
Decision tree ensembles can be represented in part by a "kernel" function whose distance metric is based on the extent to which two observations are mapped to the same leaf nodes. This function group offers utilities for evaluating this kernel.
computeForestLeafIndices computes and return a vector representation of a forest's
leaf predictions for every observation in a dataset.
The resulting vector has a "row-major" format that can be easily re-represented as
as a CSR sparse matrix: elements are organized so that the first n elements
correspond to leaf predictions for all n observations in a dataset for the
first tree in an ensemble, the next n elements correspond to predictions for
the second tree and so on. The "data" for each element corresponds to a uniquely
mapped column index that corresponds to a single leaf of a single tree (i.e.
if tree 1 has 3 leaves, its column indices range from 0 to 2, and then tree 2's
leaf indices begin at 3, etc...).
computeForestLeafVariances returns each forest's leaf node scale parameters.
If leaf scale is not sampled for the forest in question, the function throws an error that the
leaf model does not have a stochastic scale parameter.
computeForestMaxLeafIndex computes and returns the largest possible leaf index computable by computeForestLeafIndices for the forests in a designated forest sample container.
These functions are intended for advanced use cases in which users require detailed control of sampling algorithms and data structures. Minimal input validation and error checks are performed – users are responsible for providing the correct inputs. For tutorials on the "proper" usage of the stochtree's advanced workflow, we provide several vignettes at https://stochtree.ai/
computeForestLeafIndices(
model_object,
covariates,
forest_type = NULL,
propensity = NULL,
forest_inds = NULL
)
computeForestLeafVariances(model_object, forest_type, forest_inds = NULL)
computeForestMaxLeafIndex(model_object, forest_type = NULL, forest_inds = NULL)
model_object |
Object of type |
covariates |
Covariates to use for prediction. Must have the same dimensions / column types as the data used to train a forest. |
forest_type |
Which forest to use from 1. BART
2. BCF
3. ForestSamples
|
propensity |
(Optional) Propensities used for prediction (BCF-only). |
forest_inds |
(Optional) Indices of the forest sample(s) for which to compute max leaf indices. If not provided,
this function will return max leaf indices for every sample of a forest.
This function uses 0-indexing, so the first forest sample corresponds to |
computeForestLeafIndices returns a vector of size num_obs * num_trees, where num_obs = nrow(covariates)
and num_trees is the number of trees in the relevant forest of model_object.
computeForestLeafVariances returns a vector of size length(forest_inds) with the leaf scale parameter for each requested forest.
computeForestMaxLeafIndex returns a vector containing the largest possible leaf index computable by computeForestLeafIndices for the forests in a designated forest sample container.
X <- matrix(runif(10*100), ncol = 10)
y <- -5 + 10*(X[,1] > 0.5) + rnorm(100)
bart_model <- bart(X, y, num_gfr=0, num_mcmc=10)
leaf_indices <- computeForestLeafIndices(bart_model, X, "mean")
leaf_indices <- computeForestLeafIndices(bart_model, X, "mean", 0)
leaf_indices <- computeForestLeafIndices(bart_model, X, "mean", c(1,3,9))
leaf_variances <- computeForestLeafVariances(bart_model, "mean")
leaf_variances <- computeForestLeafVariances(bart_model, "mean", 0)
leaf_variances <- computeForestLeafVariances(bart_model, "mean", c(1,3,5))
max_leaf_index <- computeForestMaxLeafIndex(bart_model, "mean")
max_leaf_index <- computeForestMaxLeafIndex(bart_model, "mean", 0)
max_leaf_index <- computeForestMaxLeafIndex(bart_model, "mean", c(1,3,9))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.