| ForestSamples | R Documentation |
Wrapper around a C++ class that stores draws from an random ensemble of decision trees.
This class is intended for advanced use cases in which users require detailed control of sampling algorithms and data structures. Minimal input validation and error checks are performed – users are responsible for providing the correct inputs. For tutorials on the "proper" usage of the stochtree's advanced workflow, we provide several vignettes at https://stochtree.ai/
forest_container_ptrExternal pointer to a C++ ForestContainer class
new()Create a new ForestContainer object.
ForestSamples$new( num_trees, leaf_dimension = 1, is_leaf_constant = FALSE, is_exponentiated = FALSE )
num_treesNumber of trees
leaf_dimensionDimensionality of the outcome model
is_leaf_constantWhether leaf is constant
is_exponentiatedWhether forest predictions should be exponentiated before being returned
A new ForestContainer object.
collapse()Collapse forests in this container by a pre-specified batch size.
For example, if we have a container of twenty 10-tree forests, and we
specify a batch_size of 5, then this method will yield four 50-tree
forests. "Excess" forests remaining after the size of a forest container
is divided by batch_size will be pruned from the beginning of the
container (i.e. earlier sampled forests will be deleted). This method
has no effect if batch_size is larger than the number of forests
in a container.
ForestSamples$collapse(batch_size)
batch_sizeNumber of forests to be collapsed into a single forest
combine_forests()Merge specified forests into a single forest
ForestSamples$combine_forests(forest_inds)
forest_indsIndices of forests to be combined (0-indexed)
add_to_forest()Add a constant value to every leaf of every tree of a given forest
ForestSamples$add_to_forest(forest_index, constant_value)
forest_indexIndex of forest whose leaves will be modified (0-indexed)
constant_valueValue to add to every leaf of every tree of the forest at forest_index
multiply_forest()Multiply every leaf of every tree of a given forest by constant value
ForestSamples$multiply_forest(forest_index, constant_multiple)
forest_indexIndex of forest whose leaves will be modified (0-indexed)
constant_multipleValue to multiply through by every leaf of every tree of the forest at forest_index
load_from_json()Create a new ForestContainer object from a json object
ForestSamples$load_from_json(json_object, json_forest_label)
json_objectObject of class CppJson
json_forest_labelLabel referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
A new ForestContainer object.
append_from_json()Append to a ForestContainer object from a json object
ForestSamples$append_from_json(json_object, json_forest_label)
json_objectObject of class CppJson
json_forest_labelLabel referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
None
load_from_json_string()Create a new ForestContainer object from a json object
ForestSamples$load_from_json_string(json_string, json_forest_label)
json_stringJSON string which parses into object of class CppJson
json_forest_labelLabel referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
A new ForestContainer object.
append_from_json_string()Append to a ForestContainer object from a json object
ForestSamples$append_from_json_string(json_string, json_forest_label)
json_stringJSON string which parses into object of class CppJson
json_forest_labelLabel referring to a particular forest (i.e. "forest_0") in the overall json hierarchy
None
predict()Predict every tree ensemble on every sample in forest_dataset
ForestSamples$predict(forest_dataset)
forest_datasetForestDataset R class
matrix of predictions with as many rows as in forest_dataset
and as many columns as samples in the ForestContainer
predict_raw()Predict "raw" leaf values (without being multiplied by basis) for every tree ensemble on every sample in forest_dataset
ForestSamples$predict_raw(forest_dataset)
forest_datasetForestDataset R class
Array of predictions for each observation in forest_dataset and
each sample in the ForestSamples class with each prediction having the
dimensionality of the forests' leaf model. In the case of a constant leaf model
or univariate leaf regression, this array is two-dimensional (number of observations,
number of forest samples). In the case of a multivariate leaf regression,
this array is three-dimension (number of observations, leaf model dimension,
number of samples).
predict_raw_single_forest()Predict "raw" leaf values (without being multiplied by basis) for a specific forest on every sample in forest_dataset
ForestSamples$predict_raw_single_forest(forest_dataset, forest_num)
forest_datasetForestDataset R class
forest_numIndex of the forest sample within the container
matrix of predictions with as many rows as in forest_dataset
and as many columns as dimensions in the leaves of trees in ForestContainer
predict_raw_single_tree()Predict "raw" leaf values (without being multiplied by basis) for a specific tree in a specific forest on every observation in forest_dataset
ForestSamples$predict_raw_single_tree(forest_dataset, forest_num, tree_num)
forest_datasetForestDataset R class
forest_numIndex of the forest sample within the container
tree_numIndex of the tree to be queried
matrix of predictions with as many rows as in forest_dataset
and as many columns as dimensions in the leaves of trees in ForestContainer
set_root_leaves()Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
ForestSamples$set_root_leaves(forest_num, leaf_value)
forest_numIndex of the forest sample within the container.
leaf_valueConstant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num. Can be either a single number or a vector, depending on the forest's leaf dimension.
prepare_for_sampler()Set a constant predicted value for every tree in the ensemble. Stops program if any tree is more than a root node.
ForestSamples$prepare_for_sampler( dataset, outcome, forest_model, leaf_model_int, leaf_value )
datasetForestDataset Dataset class (covariates, basis, etc...)
outcomeOutcome Outcome class (residual / partial residual)
forest_modelForestModel object storing tracking structures used in training / sampling
leaf_model_intInteger value encoding the leaf model type (0 = constant gaussian, 1 = univariate gaussian, 2 = multivariate gaussian, 3 = log linear variance).
leaf_valueConstant leaf value(s) to be fixed for each tree in the ensemble indexed by forest_num. Can be either a single number or a vector, depending on the forest's leaf dimension.
adjust_residual()Adjusts residual based on the predictions of a forest
This is typically run just once at the beginning of a forest sampling algorithm. After trees are initialized with constant root node predictions, their root predictions are subtracted out of the residual.
ForestSamples$adjust_residual( dataset, outcome, forest_model, requires_basis, forest_num, add )
datasetForestDataset object storing the covariates and bases for a given forest
outcomeOutcome object storing the residuals to be updated based on forest predictions
forest_modelForestModel object storing tracking structures used in training / sampling
requires_basisWhether or not a forest requires a basis for prediction
forest_numIndex of forest used to update residuals
addWhether forest predictions should be added to or subtracted from residuals
save_json()Store the trees and metadata of ForestDataset class in a json file
ForestSamples$save_json(json_filename)
json_filenameName of output json file (must end in ".json")
load_json()Load trees and metadata for an ensemble from a json file. Note that
any trees and metadata already present in ForestDataset class will
be overwritten.
ForestSamples$load_json(json_filename)
json_filenameName of model input json file (must end in ".json")
num_samples()Return number of samples in a ForestContainer object
ForestSamples$num_samples()
Sample count
num_trees()Return number of trees in each ensemble of a ForestContainer object
ForestSamples$num_trees()
Tree count
leaf_dimension()Return output dimension of trees in a ForestContainer object
ForestSamples$leaf_dimension()
Leaf node parameter size
is_leaf_constant()Return constant leaf status of trees in a ForestContainer object
ForestSamples$is_leaf_constant()
TRUE if leaves are constant, FALSE otherwise
is_exponentiated()Return exponentiation status of trees in a ForestContainer object
ForestSamples$is_exponentiated()
TRUE if leaf predictions must be exponentiated, FALSE otherwise
add_forest_with_constant_leaves()Add a new all-root ensemble to the container, with all of the leaves set to the value / vector provided
ForestSamples$add_forest_with_constant_leaves(leaf_value)
leaf_valueValue (or vector of values) to initialize root nodes in tree
add_numeric_split_tree()Add a numeric (i.e. X[,i] <= c) split to a given tree in the ensemble
ForestSamples$add_numeric_split_tree( forest_num, tree_num, leaf_num, feature_num, split_threshold, left_leaf_value, right_leaf_value )
forest_numIndex of the forest which contains the tree to be split
tree_numIndex of the tree to be split
leaf_numLeaf to be split
feature_numFeature that defines the new split
split_thresholdValue that defines the cutoff of the new split
left_leaf_valueValue (or vector of values) to assign to the newly created left node
right_leaf_valueValue (or vector of values) to assign to the newly created right node
get_tree_leaves()Retrieve a vector of indices of leaf nodes for a given tree in a given forest
ForestSamples$get_tree_leaves(forest_num, tree_num)
forest_numIndex of the forest which contains tree tree_num
tree_numIndex of the tree for which leaf indices will be retrieved
get_tree_split_counts()Retrieve a vector of split counts for every training set variable in a given tree in a given forest
ForestSamples$get_tree_split_counts(forest_num, tree_num, num_features)
forest_numIndex of the forest which contains tree tree_num
tree_numIndex of the tree for which split counts will be retrieved
num_featuresTotal number of features in the training set
get_forest_split_counts()Retrieve a vector of split counts for every training set variable in a given forest
ForestSamples$get_forest_split_counts(forest_num, num_features)
forest_numIndex of the forest for which split counts will be retrieved
num_featuresTotal number of features in the training set
get_aggregate_split_counts()Retrieve a vector of split counts for every training set variable in a given forest, aggregated across ensembles and trees
ForestSamples$get_aggregate_split_counts(num_features)
num_featuresTotal number of features in the training set
get_granular_split_counts()Retrieve a vector of split counts for every training set variable in a given forest, reported separately for each ensemble and tree
ForestSamples$get_granular_split_counts(num_features)
num_featuresTotal number of features in the training set
ensemble_tree_max_depth()Maximum depth of a specific tree in a specific ensemble in a ForestSamples object
ForestSamples$ensemble_tree_max_depth(ensemble_num, tree_num)
ensemble_numEnsemble number
tree_numTree index within ensemble ensemble_num
Maximum leaf depth
average_ensemble_max_depth()Average the maximum depth of each tree in a given ensemble in a ForestSamples object
ForestSamples$average_ensemble_max_depth(ensemble_num)
ensemble_numEnsemble number
Average maximum depth
average_max_depth()Average the maximum depth of each tree in each ensemble in a ForestContainer object
ForestSamples$average_max_depth()
Average maximum depth
num_forest_leaves()Number of leaves in a given ensemble in a ForestSamples object
ForestSamples$num_forest_leaves(forest_num)
forest_numIndex of the ensemble to be queried
Count of leaves in the ensemble stored at forest_num
sum_leaves_squared()Sum of squared (raw) leaf values in a given ensemble in a ForestSamples object
ForestSamples$sum_leaves_squared(forest_num)
forest_numIndex of the ensemble to be queried
Average maximum depth
is_leaf_node()Whether or not a given node of a given tree in a given forest in the ForestSamples is a leaf
ForestSamples$is_leaf_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
TRUE if node is a leaf, FALSE otherwise
is_numeric_split_node()Whether or not a given node of a given tree in a given forest in the ForestSamples is a numeric split node
ForestSamples$is_numeric_split_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
TRUE if node is a numeric split node, FALSE otherwise
is_categorical_split_node()Whether or not a given node of a given tree in a given forest in the ForestSamples is a categorical split node
ForestSamples$is_categorical_split_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
TRUE if node is a categorical split node, FALSE otherwise
parent_node()Parent node of given node of a given tree in a given forest in a ForestSamples object
ForestSamples$parent_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Integer ID of the parent node
left_child_node()Left child node of given node of a given tree in a given forest in a ForestSamples object
ForestSamples$left_child_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Integer ID of the left child node
right_child_node()Right child node of given node of a given tree in a given forest in a ForestSamples object
ForestSamples$right_child_node(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Integer ID of the right child node
node_depth()Depth of given node of a given tree in a given forest in a ForestSamples object, with 0 depth for the root node.
ForestSamples$node_depth(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Integer valued depth of the node
node_split_index()Split index of given node of a given tree in a given forest in a ForestSamples object. Returns -1 is node is a leaf.
ForestSamples$node_split_index(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Integer valued depth of the node
node_split_threshold()Threshold that defines a numeric split for a given node of a given tree in a given forest in a ForestSamples object.
Returns Inf if the node is a leaf or a categorical split node.
ForestSamples$node_split_threshold(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Threshold defining a split for the node
node_split_categories()Array of category indices that define a categorical split for a given node of a given tree in a given forest in a ForestSamples object.
Returns c(Inf) if the node is a leaf or a numeric split node.
ForestSamples$node_split_categories(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Categories defining a split for the node
node_leaf_values()Leaf node value(s) for a given node of a given tree in a given forest in a ForestSamples object.
Values are stale if the node is a split node.
ForestSamples$node_leaf_values(forest_num, tree_num, node_id)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
node_idIndex of the node to be queried
Vector (often univariate) of leaf values
num_nodes()Number of nodes in a given tree in a given forest in a ForestSamples object.
ForestSamples$num_nodes(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Count of total tree nodes
num_leaves()Number of leaves in a given tree in a given forest in a ForestSamples object.
ForestSamples$num_leaves(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Count of total tree leaves
num_leaf_parents()Number of leaf parents (split nodes with two leaves as children) in a given tree in a given forest in a ForestSamples object.
ForestSamples$num_leaf_parents(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Count of total tree leaf parents
num_split_nodes()Number of split nodes in a given tree in a given forest in a ForestSamples object.
ForestSamples$num_split_nodes(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Count of total tree split nodes
nodes()Array of node indices in a given tree in a given forest in a ForestSamples object.
ForestSamples$nodes(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Indices of tree nodes
leaves()Array of leaf indices in a given tree in a given forest in a ForestSamples object.
ForestSamples$leaves(forest_num, tree_num)
forest_numIndex of the forest to be queried
tree_numIndex of the tree to be queried
Indices of leaf nodes
delete_sample()Modify the ForestSamples object by removing the forest sample indexed by 'forest_num
ForestSamples$delete_sample(forest_num)
forest_numIndex of the forest to be removed
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.