Gets OOB Loss for each observation in a passed data frame.

Share:

Description

Uses bootstrap sampling to get average loss for each observation. We do this by averaging out-of-bag loss over a number of runs for each data point. Each in-sample tree is fit using the same settings as model_tree.obj.

Usage

1
2
3
getOOBLoss(model_tree.obj,data,nboot=100,
		sampleFcn = function(idx_vec){sample(idx_vec,replace=TRUE)},
		minsplit, minbucket,lossfcn)

Arguments

model_tree.obj

An itree object fit using the same variable names as are present in data. In each bootstrap sample, the parameters of the tree grown to the learning sample are taken to match those of model_tree.obj.

data

A data frame – we construct risk estimates for each row of in this frame by looking at average out-of-bag loss.

nboot

Number of bootrap/cross-validation runs.

sampleFcn

Any function that takes a vector of indices (1,...,N) where N=nrow(data) and returns the indices of a learning sample. The default function does bootstrap sampling (size N with replacement). The holdout sample is figured out automatically by finding the set difference between 1:N and the learning sample.

minsplit

Specifies the minsplit argument to use for each tree. If blank, we get the minsplit argument from model_tree.obj. If minsplit is numeric and greater than 1, it represents the number of observations. If minsplit is numeric and < 1, it is treated as a fraction of N.

minbucket

Same as minsplit but for the minbucket argument.

lossfcn

A function that takes two vectors, trueY and predY, and outputs a vector of losses. If lossfcn is missing, we default to misclassification rate and squared-error-loss for classification and regression respectively.

Value

A list with elements:

bagpred = Average predictions for all N observations over all nboot runs. For classification it is the most common class.

holdout.predictions = An N x nboot matrix with holdout predictions for each run along the columns. If observation i is in the learning sample for run j, then holdout.predicitions[i,j] is NA.

avgOOBloss = N x 1 vector of average OOB loss.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
require(mlbench); data(BostonHousing)
#fit a tree:
cart <- itree(medv~.,BostonHousing,minsplit=25,minbucket=25,cp=0)

#generate theta-hat values by computing average out-of-bag loss:
## Not run: 
theta_hats <- getOOBLoss(model_tree.obj=cart.bh,data=bh,nboot=100)

# Then for each leaf we estimate local risk by the mean in-node theta-hat.
lre <- estNodeRisk(tree.obj=cart.bh,est_observation_loss=theta_hats$avgOOBloss)

# to add the lre to the plot:
plot(cart.bh, do_node_re= TRUE, uniform=TRUE)
text(cart.bh, est_node_risk = lre)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.