tdmModConfmat: Calculate confusion matrix, gain and RGain measure.

Calculate confusion matrix, gain and RGain measure.


tdmModConfmat(d, colreal, colpred, opts, predProb = NULL)



data frame


name of column in d which contains the real class


name of column in d which contains the predicted class


a list from which we use the elements:

  • gainmat: the gain matrix for each possible outcome, same size as cm$mat (see below).
    gainmat[R1,P2] is the gain associated with a record of real class R1 which we predict as class P2. (gain matrix = - cost matrix)

  • rgain.type: one out of {"rgain" | "meanCA" | "minCA" | "bYouden" | "arROC" | "arLIFT" | "arPRE" }, affects output cm$mat and cm$rgain, see below.


if not NULL, a data frame with as many rows as data frame d, containing columns (index, true label, predicted label, prediction score). Is only needed for opts$rgain.type=="ar*".


cm, a list containing:


matrix with real class levels as rows, predicted class levels columns.
mat[R1,P2] is the number of records with real class R1 predicted as class P2, if opts$rgain.type=="rgain". If opts$rgain.type=="meanCA" or "minCA", then show this number as percentage of "records with real class R1" (percentage of each row). CAUTION: If there are NA's in column colpred, those cases are missing in mat (!) (but the class errors are correct as long as there are no NA's in column colreal)


class error rates, vector of size nlevels(colreal)+1.
cerr[X] is the misclassification rate for real class X.
cerr["Total"] is the total classification error rate.


the total gain (sum of pointwise product opts$gainmat*cm$mat)


gain.vector[X] is the gain attributed to real class label X. gain.vector["Total"] is again the total gain.


the maximum achievable gain, assuming perfect prediction


Depending on the value of opts$rgain.type:
"rgain": ratio gain/gainmax in percent,
"meanCA": mean class accuracy percentage (i.e. mean(diag(cm$mat)),
"minCA": min class accuracy percentage (i.e. min(diag(cm$mat)),
"bYouden": balanced Youden index: min(sensitivity,specificity),
"arROC": area under ROC curve (a number in [0,1]),
"arLIFT": area between lift curve and horizontal line 1.0,
"arPRE": area under precision-recall curve (a number in [0,1])


For all measures rgain holds: The higher, the better.
The last four elements of opts$rgain.type= "bYouden","arROC", "arLIFT","arPre" are only available for binary classification.
For case "bYouden":
sensitivity = TP/(TP+FN)
specificity = TN/(TN+FP)


Wolfgang Konen (, Patrick Koch

