Description Usage Arguments Details Value Author(s) References See Also Examples
This function analyses and shows the statistical significance results
of comparing the estimated average evaluation scores of a set of
learners. When you run the experimentalComparison()
function to
compare a set of learners over a set of problems you obtain estimates
of their performances across these problems. This function allows you
to test whether the observed differences in these estimated
performances are statistically significant with a certain confidence
level.
1 2 3 |
comp |
This is a |
against |
When you carry out this type of analysis you have to select against which learner all others will be compared to. By default this will be the first system in the alternatives you have supplied when running the experiments. This parameter allows you to specify the identifier of any other learner as the one to compare against. |
stats |
By default the analysis will be carried out across all evaluation statistics estimated in the experimental comparison. This parameter allows you to supply a vector with the names of the subset of statistics you wish to analyse. |
datasets |
By default the analysis will be carried out across all problems you have used in the experimental comparison. This parameter allows you to supply a vector with the names of the subset of problems you wish to analyse. |
show |
By default this function shows a table with the results of the
analysis and will silently return a data structure (see section Value)
with these results. If you set this parameter to |
Independently of the experimental methodology you select (e.g. cross
validation) all results you obtain with the
experimentalComparison()
function are estimates of the
(unknown) true scores of the learners you are comparing. This function
allows you to carry out a statistical test to check the statistical
significance of the observed differences among the learners. Namely,
the function will carry out a Wilcoxon paired test for checking the
significance of the differences among the estimated average
scores. The function will print the results of these tests using a set
of symbols that correspond to a set of pre-defined confidence levels
(essencially the standard 95% and 99% thresholds). All tests are
carried out between two learners: the one indicated in the
against
parameter, which defaults to the first learner in the
experiments (named Learn.1 on the tables); and all other learners. For each of the competitors the
function will print a symbol beside its average score representing the result of the comparison
against the baseline learner. If there is no symbol it means that the difference among
the two learners can not be considered statistically significant with
95% confidence. If there is one symbol (either a "+" or a "-") it
means the statistical confidence on the difference is between 95% and
99%. A "+" means
the competitor has a larger estimated value (this can be good or bad
depending on the statistic being estimated) than the baseline, whilst
a "-" means the opposite. Finally, two symbols (either "++" or "–")
mean that the difference is significant with more than 99% confidence.
Usually this function is used to print the tables with the results of the statistical significance tests. However, the function also returns silently the information on these tables, so that you may further proccess it if you want. This means that if you assign the results of the function to some variable, you will get as a result a list with as many components as there are evaluation statistics in your experiment. For each of these list components, you will get a data frame with the results of the comparison following the same schema as the printed version.
Luis Torgo ltorgo@dcc.fc.up.pt
Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).
http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR
experimentalComparison
,compExp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## Estimating several evaluation metrics on different variants of a
## regression tree on a data set, using one repetition of 10-fold CV
data(swiss)
## First the user defined functions
cv.rpartXse <- function(form, train, test, ...) {
require(DMwR)
t <- rpartXse(form, train, ...)
p <- predict(t, test)
mse <- mean((p - resp(form, test))^2)
c(nmse = mse/mean((mean(resp(form, train)) - resp(form, test))^2),
mse = mse)
}
results <- experimentalComparison(
c(dataset(Infant.Mortality ~ ., swiss)),
c(variants('cv.rpartXse',se=c(0,0.5,1))),
cvSettings(1,10,1234)
)
## Testing the statistical significance of the differences
compAnalysis(results)
## Comparing against the learner with best NMSE, and only on that statistic
compAnalysis(results,against=bestScores(results)$swiss['nmse','system'],
stats='nmse')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.