SRCSranks: Computes the ranks of all the algorithms from their...
In SRCS: Statistical Ranking Color Scheme for Multiple Pairwise Comparisons

Description Usage Arguments Value Note See Also

Computes the ranks of all the algorithms from their (repeated) results measurements after grouping them by several factors combined simultaneosly.

SRCSranks(data, params, target, performance, pairing.col = NULL,
  test = c("wilcoxon", "t", "tukeyHSD", "custom"), fun = NULL,
  correction = p.adjust.methods, alpha = 0.05, maximize = TRUE,
  ncores = 1, paired = FALSE, ...)

`data`	A dataframe object containing (at least) two columns for the target factor and the performance measure Additional columns are aimed at grouping the problem configuration by (at most) 3 different factors.
`params`	A vector with the column names in `data` that define a problem configuration. If not already factor objects, those columns will be converted to factors inside the function (note this does not alter the ordering of the levels in case it was explicitly set before the call). Although an arbitrary number of columns can be passed, if the user intends to plot the ranks computed by this function, at most three columns should be passed.
`target`	Name of the target column of `data`. For each combination of the values of `params`, the ranks are obtained by comparing the repeated measurements of `performance` associated to each level of the `target` column.
`performance`	Name of the column of `data` containing the repeated performance measurements. If given a vector of strings, then a separate ranking will be computed for each of the elements, and no p-values, mean or stdev columns will be returned, just the rankings together with the factors to indicate which problem configuration corresponds to the rank.
`pairing.col`	Name of the column which links together the paired samples, in case we have set `paired = TRUE`. Otherwise, this argument will be ignored.
`test`	The statistical test to be performed to compare the performance of every level of the target variable at each problem configuration.
`fun`	Function performing a custom statistical test, if `test = "custom"`; otherwise, this argument is ignored. The function must receive exactly two vectors (the first is a vector of real numbers and the second is a factor with the level to which each real number corresponds) and must return a `pairwise.htest` object with a `p.value` field. This must be an (N-1)x(N-1) lower-triangular matrix, with exactly the same structure as those returned in the `p.value` field by a call to `pairwise.wilcox.test` or `pairwise.t.test`.
`correction`	The p-value adjust method. Must be one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none" (defaults to "holm"). This parameter will be ignored if `test = "tukeyHSD"` as Tukey HSD incorporates its own correction procedure.
`alpha`	Significance threshold for pairwise comparisons. Defaults to 0.05.
`maximize`	Boolean indicating whether the higher the performance measure, the better (default), or vice-versa.
`ncores`	Number of physical CPUs available for computations. If `ncores` > 1, parallelization is achieved through the `parallel` package and is applied to the computation of ranks for more than one problem configuration at the same time. Defaults to 1 (sequential).
`paired`	Boolean indicating whether samples in the same problem configuration, which only differ in the target value, and in the same relative position (row) within their respective target values are paired or not. Defaults to FALSE. This should be set to TRUE, for instance, in Machine Learning problems in which, for a fixed problem configuration, the target variable (usually the algorithms being compared) is associated to a number of samples (results) coming from the Cross Validation process. If a K-fold CV is being done, then we would have, for a given problem configuration, K rows for each of the algorithms being compared, all of them identical in all the columns except for the performance column. In that case, the performance of the i-th row (1 <= i <= K) of all of those batches (groups of K rows) for that fixed problem configuration would be related, hence every pairwise comparison should take into account paired samples.
`...`	Further arguments to be passed to the function `fun` that is called for every pairwise comparison.

If length(performance) equals 1, an object of classes c("SRCS", "data.frame") with the following columns: - A set of columns with the same names as the params and target arguments. - Two columns called "mean" and "sd" containing the mean of the repeated peformance measurements for each problem configuration and the standard deviation. - One column named "rank" with the actual rank of each level of the target variable within that problem configuration. The lower the rank, the better the algorithm. - |target| additional columns containing the p-values resulting of the comparison between the algorithm and the rest for the same problem configuration, where |target| is the number of levels of the target variable.

If length(performance) > 1 (let P = length(performance) for the explanation that follows), an object of classes c("SRCS","data.frame") with the following columns: - A set of columns with the same names as the params and target arguments. - One column per element of the performance vector, named "rank1", ..., "rankP", containing, for each performance measure, the rank of each level of the target variable within that problem configuration for that performance measure. The higher the rank, the better the algorithm.

Although it has no effect on the results of SRCSranks, the user should preferably have set the order of the factor levels explicitly by calling function levels before calling this function, specially if he intends to subsequently apply plot to the results, because the level order does affect the way graphics are arranged in the plot.

plot.SRCS for a full working example of SRCSranks and plotting facilities. Also pairwise.wilcox.test, t.test, pairwise.t.test, TukeyHSD, p.adjust.methods