rankingPlot | R Documentation |
Pair-wise overlaps can be done for two types of analyses. Firstly, each cross-validation iteration can be considered within a single classification. This explores the feature ranking stability. Secondly, the overlap may be considered between different classification results. This approach compares the feature ranking commonality between different results. Two types of commonality are possible to analyse. One summary is the average pair-wise overlap between all possible pairs of results. The second kind of summary is the pair-wise overlap of each level of the comparison factor that is not the reference level against the reference level. The overlaps are converted to percentages and plotted as lineplots.
## S4 method for signature 'ClassifyResult'
rankingPlot(results, ...)
## S4 method for signature 'list'
rankingPlot(
results,
topRanked = seq(10, 100, 10),
comparison = "within",
referenceLevel = NULL,
characteristicsList = list(),
orderingList = list(),
sizesList = list(lineWidth = 1, pointSize = 2, legendLinesPointsSize = 1, fonts = c(24,
16, 12, 12, 12, 16)),
lineColours = NULL,
xLabelPositions = seq(10, 100, 10),
yMax = 100,
title = if (comparison[1] == "within") "Feature Ranking Stability" else
"Feature Ranking Commonality",
yLabel = if (is.null(referenceLevel)) "Average Common Features (%)" else
paste("Average Common Features with", referenceLevel, "(%)"),
margin = grid::unit(c(1, 1, 1, 1), "lines"),
showLegend = TRUE,
parallelParams = bpparam()
)
results |
A list of |
... |
Not used by end user. |
topRanked |
A sequence of thresholds of number of the best features to use for overlapping. |
comparison |
Default: |
referenceLevel |
The level of the comparison factor to use as the
reference to compare each non-reference level to. If |
characteristicsList |
A named list of characteristics. The name must be
one of |
orderingList |
An optional named list. Any of the variables specified
to |
sizesList |
Default: |
lineColours |
A vector of colours for different levels of the line
colouring parameter, if one is specified by
|
xLabelPositions |
Locations where to put labels on the x-axis. |
yMax |
The maximum value of the percentage to plot. |
title |
An overall title for the plot. |
yLabel |
Label to be used for the y-axis of overlap percentages. |
margin |
The margin to have around the plot. |
showLegend |
If |
parallelParams |
An object of class |
If comparison
is "within"
, then the feature selection overlaps
are compared within a particular analysis. The result will inform how stable
the selections are between different iterations of cross-validation for a
particular analysis. Otherwise, the comparison is between different
cross-validation runs, and this gives an indication about how common are the
features being selected by different classifications.
Calculating all pair-wise set overlaps for a large cross-validation result
can be time-consuming. This stage can be done on multiple CPUs by providing
the relevant options to parallelParams
.
An object of class ggplot
and a plot on the current graphics
device, if plot
is TRUE
.
Dario Strbenac
predicted <- DataFrame(sample = sample(10, 100, replace = TRUE),
permutation = rep(1:2, each = 50),
class = rep(c("Healthy", "Cancer"), each = 50))
actual <- factor(rep(c("Healthy", "Cancer"), each = 5))
allFeatures <- sapply(1:100, function(index) paste(sample(LETTERS, 3), collapse = ''))
rankList <- list(allFeatures[1:100], allFeatures[c(15:6, 1:5, 16:100)],
allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 61:100, 60:51)])
result1 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name", "Cross-validation"),
value = c("Melanoma", "t-test", "Diagonal LDA", "2 Permutations, 2 Folds")),
LETTERS[1:10], allFeatures, rankList,
list(rankList[[1]][1:15], rankList[[2]][1:15],
rankList[[3]][1:10], rankList[[4]][1:10]),
list(function(oracle){}), NULL,
predicted, actual)
predicted[, "class"] <- sample(predicted[, "class"])
rankList <- list(allFeatures[1:100], allFeatures[c(sample(20), 21:100)],
allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 60:51, 61:100)])
result2 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name",
"Cross-validations"),
value = c("Melanoma", "t-test", "Random Forest", "2 Permutations, 2 Folds")),
LETTERS[1:10], allFeatures, rankList,
list(rankList[[1]][1:15], rankList[[2]][1:15],
rankList[[3]][1:10], rankList[[4]][1:10]),
list(function(oracle){}), NULL,
predicted, actual)
rankingPlot(list(result1, result2), characteristicsList = list(pointType = "Classifier Name"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.