Description Usage Arguments Value Author(s) References See Also Examples

This function creates point and probability forecasts from the trees in a random forest using Jose et al.'s trimmed opinion pool, a trimmed average of the trees' empirical cumulative distribution functions (cdf). For tuning purposes, the user can input the trimming level used in this trimmed average and then compare the scores of the trimmed and untrimmed opinion pools, or ensembles.

1 2 3 4 5 6 | ```
trimTrees(xtrain, ytrain, xtest, ytest=NULL, ntree = 500,
mtry = if (!is.null(ytrain) && !is.factor(ytrain))
max(floor(ncol(xtrain)/3), 1) else floor(sqrt(ncol(xtrain))),
nodesize = if (!is.null(ytrain) && !is.factor(ytrain)) 5 else 1,
trim = 0,trimIsExterior = TRUE,
uQuantiles = seq(0.05, 0.95, 0.05), methodIsCDF = TRUE)
``` |

`xtrain` |
A data frame or a matrix of predictors for the training set. |

`ytrain` |
A response vector for the training set. If a factor, classification is assumed, otherwise regression is assumed. |

`xtest` |
A data frame or a matrix of predictors for the testing set. |

`ytest` |
A response vector for the testing set. If no testing set is passed, probability integral transform (PIT) values and scores will be returned as |

`ntree` |
Number of trees to grow. |

`mtry` |
Number of variables randomly sampled as candidates at each split. |

`nodesize` |
Minimum size of terminal nodes. |

`trim` |
The trimming level used in the trimmed average of the trees' empirical cdfs. For the cdf approach, the trimming level is the fraction of cdfs values to be trimmed from each end of the ordered vector of cdf values (for each support point) before the average is computed. For the moment approach, the trees' means are computed, ordered, and trimmed. The trimmed opinion pool using the moment approach is an average of the remaining trees. |

`trimIsExterior` |
If |

`uQuantiles` |
A vector of probabilities in a strictly increasing order and between 0 and 1. For instance, if |

`methodIsCDF` |
If |

An object of class `trimTrees`

, which is a list with the following components:

`forestSupport` |
Possible points of support for the trees and ensembles. |

`treeValues` |
For the last testing set row, this component outputs each tree's |

`treeCounts` |
For the last testing set row, each tree's counts of |

`treeCumCounts` |
Cumulative tally of |

`treeCDFs` |
Each tree's empirical cdf based on |

`treePMFs` |
Each tree's empirical probability mass function (pmf) for the last testing set row. This component is an |

`treeMeans` |
For each testing set row, each tree's mean according to its empirical pmf. This component is an |

`treeVars` |
For each testing set row, each tree's variance according to its empirical pmf. This component is an |

`treePITs` |
For each testing set row, each tree's probability integral transform (PIT), the empirical cdf evaluated at the realized |

`treeQuantiles` |
For the last testing set row, each tree's quantiles – one for each element in |

`treeFirstPMFValues` |
For each testing set row, this component outputs the pmf value on the minimum (or first) support point in the forest. For binary classification, this corresponds to the probability that the minimum (or first) support point will occur. This component's dimension is |

`bracketingRate` |
For each testing set row, the bracketing rate from Larrick et al. (2012) is computed as |

`bracketingRateAllPairs` |
The average bracketing rate across all testing set rows for each pair of trees. This component is a symmetric |

`trimmedEnsembleCDFs` |
For each testing set row, the trimmed ensemble's forecast of |

`trimmedEnsemblePMFs` |
For each testing set row, the trimmed ensemble's pmf. This component is an |

`trimmedEnsembleMeans` |
For each testing set row, the trimmed ensemble's mean. This component is an |

`trimmedEnsembleVars` |
For each testing set row, the trimmed ensemble's variance. |

`trimmedEnsemblePITs` |
For each testing set row, the trimmed ensemble's probability integral transform (PIT), the empirical cdf evaluated at the realized |

`trimmedEnsembleQuantiles` |
For the last testing set row, the trimmed ensemble's quantiles – one for each element in |

`trimmedEnsembleComponentScores` |
For the last testing set row, the components of the trimmed ensemble's linear and log quantile scores.If |

`trimmedEnsembleScores` |
For each testing set row, the trimmed ensemble's linear and log quantile scores, ranked probability score, and two-moment score. See Jose and Winkler (2009) for a description of the linear and log quantile scores. See Gneiting and Raftery (2007) for a description of the ranked probability score. The two-moment score is the score in Equation 27 of Gneiting and Raftery (2007). If |

`untrimmedEnsembleCDFs` |
For each testing set row, the linear opinion pool's, or untrimmed ensemble's, forecast of |

`untrimmedEnsemblePMFs` |
For each testing set row, the untrimmed ensemble's pmf. |

`untrimmedEnsembleMeans` |
For each testing set row, the untrimmed ensemble's mean. |

`untrimmedEnsembleVars` |
For each testing set row, the untrimmed ensemble's variance. |

`untrimmedEnsemblePITs` |
For each testing set row, the untrimmed ensemble's probability integral transform (PIT), the empirical cdf evaluated at the realized |

`untrimmedEnsembleQuantiles` |
For the last testing set row, the untrimmed ensemble's quantiles – one for each element in |

`untrimmedEnsembleComponentScores` |
For the last testing set row, the components of the untrimmed ensemble's linear and log quantile scores. If |

`untrimmedEnsembleScores` |
For each testing set row, the untrimmed ensemble's linear and log quantile scores, ranked probability score, and two-moment score. If |

Yael Grushka-Cockayne, Victor Richmond R. Jose, Kenneth C. Lichtendahl Jr., and Huanghui Zeng.

Gneiting T, Raftery AE. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102 359-378.

Jose VRR, Grushka-Cockayne Y, Lichtendahl KC Jr. (2014). Trimmed opinion pools and the crowd's calibration problem. Management Science 60 463-475.

Jose VRR, Winkler RL (2009). Evaluating quantile assessments. Operations Research 57 1287-1297.

Grushka-Cockayne Y, Jose VRR, Lichtendahl KC Jr. (2014). Ensembles of overfit and overconfident forecasts, working paper.

Larrick RP, Mannes AE, Soll JB (2011). The social psychology of the wisdom of crowds. In J.I. Krueger, ed., Frontiers in Social Psychology: Social Judgment and Decision Making. New York: Psychology Press, 227-242.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ```
# Load the data
set.seed(201) # Can be removed; useful for replication
data <- as.data.frame(mlbench.friedman1(500, sd=1))
summary(data)
# Prepare data for trimming
train <- data[1:400, ]
test <- data[401:500, ]
xtrain <- train[,-11]
ytrain <- train[,11]
xtest <- test[,-11]
ytest <- test[,11]
# Option 1. Run trimTrees with responses in testing set.
set.seed(201) # Can be removed; useful for replication
tt1 <- trimTrees(xtrain, ytrain, xtest, ytest, trim=0.15)
#Some outputs from trimTrees: scores, hit rates, PIT densities.
colMeans(tt1$trimmedEnsembleScores)
colMeans(tt1$untrimmedEnsembleScores)
mean(hitRate(tt1$treePITs))
hitRate(tt1$trimmedEnsemblePITs)
hitRate(tt1$untrimmedEnsemblePITs)
hist(tt1$trimmedEnsemblePITs, prob=TRUE)
hist(tt1$untrimmedEnsemblePITs, prob=TRUE)
# Option 2. Run trimTrees without responses in testing set.
# In this case, scores, PITs, or hit rates will not be available.
set.seed(201) # Can be removed; useful for replication
tt2 <- trimTrees(xtrain, ytrain, xtest, trim=0.15)
# Some outputs from trimTrees: cdfs for last test value.
plot(tt2$trimmedEnsembleCDFs[100,],type="l",col="red",ylab="cdf",xlab="y")
lines(tt2$untrimmedEnsembleCDFs[100,])
legend(275,0.2,c("trimmed", "untrimmed"),col=c("red","black"),lty = c(1, 1))
title("CDFs of Trimmed and Untrimmed Ensembles")
# Compare the CDF and moment approaches to trimming the trees.
ttCDF <- trimTrees(xtrain, ytrain, xtest, trim=0.15, methodIsCDF=TRUE)
ttMA <- trimTrees(xtrain, ytrain, xtest, trim=0.15, methodIsCDF=FALSE)
plot(ttCDF$trimmedEnsembleCDFs[100,], type="l", col="red", ylab="cdf", xlab="y")
lines(ttMA$trimmedEnsembleCDFs[100,])
legend(275,0.2,c("CDF Approach", "Moment Approach"), col=c("red","black"),lty = c(1, 1))
title("CDFs of Trimmed Ensembles")
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.