t.test: Paired t-Tests for Model Comparisons

t.testR Documentation

Paired t-Tests for Model Comparisons

Description

Paired t-test comparisons of resampled performance metrics from different models.

Usage

## S3 method for class 'PerformanceDiff'
t.test(x, adjust = "holm", ...)

Arguments

x

performance difference result.

adjust

method of p-value adjustment for multiple statistical comparisons as implemented by p.adjust.

...

arguments passed to other methods.

Details

The t-test statistic for pairwise model differences of R resampled performance metric values is calculated as

t = \frac{\bar{x}_R}{\sqrt{F s^2_R / R}},

where \bar{x}_R and s^2_R are the sample mean and variance. Statistical testing for a mean difference is then performed by comparing t to a t_{R-1} null distribution. The sample variance in the t statistic is known to underestimate the true variances of cross-validation mean estimators. Underestimation of these variances will lead to increased probabilities of false-positive statistical conclusions. Thus, an additional factor F is included in the t statistic to allow for variance corrections. A correction of F = 1 + K / (K - 1) was found by Nadeau and Bengio (2003) to be a good choice for cross-validation with K folds and is thus used for that resampling method. The extension of this correction by Bouchaert and Frank (2004) to F = 1 + T K / (K - 1) is used for cross-validation with K folds repeated T times. For other resampling methods F = 1.

Value

PerformanceDiffTest class object that inherits from array. p-values and mean differences are contained in the lower and upper triangular portions, respectively, of the first two dimensions. Model pairs are contained in the third dimension.

References

Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–81.

Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H. Dai, R. Srikant, & C. Zhang (Eds.), Advances in knowledge discovery and data mining (pp. 3–12). Springer.

Examples


## Requires prior installation of suggested package gbm to run

## Numeric response example
fo <- sale_amount ~ .
control <- CVControl()

gbm_res1 <- resample(fo, ICHomes, GBMModel(n.trees = 25), control)
gbm_res2 <- resample(fo, ICHomes, GBMModel(n.trees = 50), control)
gbm_res3 <- resample(fo, ICHomes, GBMModel(n.trees = 100), control)

res <- c(GBM1 = gbm_res1, GBM2 = gbm_res2, GBM3 = gbm_res3)
res_diff <- diff(res)
t.test(res_diff)



MachineShop documentation built on Sept. 11, 2024, 6:28 p.m.