SRCSranks: Computes the ranks of all the algorithms from their...

Description Usage Arguments Value Note See Also

Description

Computes the ranks of all the algorithms from their (repeated) results measurements after grouping them by several factors combined simultaneosly.

Usage

1
2
3
4
SRCSranks(data, params, target, performance, pairing.col = NULL,
  test = c("wilcoxon", "t", "tukeyHSD", "custom"), fun = NULL,
  correction = p.adjust.methods, alpha = 0.05, maximize = TRUE,
  ncores = 1, paired = FALSE, ...)

Arguments

data

A dataframe object containing (at least) two columns for the target factor and the performance measure Additional columns are aimed at grouping the problem configuration by (at most) 3 different factors.

params

A vector with the column names in data that define a problem configuration. If not already factor objects, those columns will be converted to factors inside the function (note this does not alter the ordering of the levels in case it was explicitly set before the call). Although an arbitrary number of columns can be passed, if the user intends to plot the ranks computed by this function, at most three columns should be passed.

target

Name of the target column of data. For each combination of the values of params, the ranks are obtained by comparing the repeated measurements of performance associated to each level of the target column.

performance

Name of the column of data containing the repeated performance measurements. If given a vector of strings, then a separate ranking will be computed for each of the elements, and no p-values, mean or stdev columns will be returned, just the rankings together with the factors to indicate which problem configuration corresponds to the rank.

pairing.col

Name of the column which links together the paired samples, in case we have set paired = TRUE. Otherwise, this argument will be ignored.

test

The statistical test to be performed to compare the performance of every level of the target variable at each problem configuration.

fun

Function performing a custom statistical test, if test = "custom"; otherwise, this argument is ignored. The function must receive exactly two vectors (the first is a vector of real numbers and the second is a factor with the level to which each real number corresponds) and must return a pairwise.htest object with a p.value field. This must be an (N-1)x(N-1) lower-triangular matrix, with exactly the same structure as those returned in the p.value field by a call to pairwise.wilcox.test or pairwise.t.test.

correction

The p-value adjust method. Must be one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none" (defaults to "holm"). This parameter will be ignored if test = "tukeyHSD" as Tukey HSD incorporates its own correction procedure.

alpha

Significance threshold for pairwise comparisons. Defaults to 0.05.

maximize

Boolean indicating whether the higher the performance measure, the better (default), or vice-versa.

ncores

Number of physical CPUs available for computations. If ncores > 1, parallelization is achieved through the parallel package and is applied to the computation of ranks for more than one problem configuration at the same time. Defaults to 1 (sequential).

paired

Boolean indicating whether samples in the same problem configuration, which only differ in the target value, and in the same relative position (row) within their respective target values are paired or not. Defaults to FALSE. This should be set to TRUE, for instance, in Machine Learning problems in which, for a fixed problem configuration, the target variable (usually the algorithms being compared) is associated to a number of samples (results) coming from the Cross Validation process. If a K-fold CV is being done, then we would have, for a given problem configuration, K rows for each of the algorithms being compared, all of them identical in all the columns except for the performance column. In that case, the performance of the i-th row (1 <= i <= K) of all of those batches (groups of K rows) for that fixed problem configuration would be related, hence every pairwise comparison should take into account paired samples.

...

Further arguments to be passed to the function fun that is called for every pairwise comparison.

Value

If length(performance) equals 1, an object of classes c("SRCS", "data.frame") with the following columns: - A set of columns with the same names as the params and target arguments. - Two columns called "mean" and "sd" containing the mean of the repeated peformance measurements for each problem configuration and the standard deviation. - One column named "rank" with the actual rank of each level of the target variable within that problem configuration. The lower the rank, the better the algorithm. - |target| additional columns containing the p-values resulting of the comparison between the algorithm and the rest for the same problem configuration, where |target| is the number of levels of the target variable.

If length(performance) > 1 (let P = length(performance) for the explanation that follows), an object of classes c("SRCS","data.frame") with the following columns: - A set of columns with the same names as the params and target arguments. - One column per element of the performance vector, named "rank1", ..., "rankP", containing, for each performance measure, the rank of each level of the target variable within that problem configuration for that performance measure. The higher the rank, the better the algorithm.

Note

Although it has no effect on the results of SRCSranks, the user should preferably have set the order of the factor levels explicitly by calling function levels before calling this function, specially if he intends to subsequently apply plot to the results, because the level order does affect the way graphics are arranged in the plot.

See Also

plot.SRCS for a full working example of SRCSranks and plotting facilities. Also pairwise.wilcox.test, t.test, pairwise.t.test, TukeyHSD, p.adjust.methods


SRCS documentation built on May 2, 2019, 8:34 a.m.

Related to SRCSranks in SRCS...