tdmModConfmat: Calculate confusion matrix, gain and RGain measure.
In TDMR: Tuned Data Mining in R

Description Usage Arguments Value Note Author(s) See Also

Calculate confusion matrix, gain and RGain measure.

1	tdmModConfmat(d, colreal, colpred, opts, predProb = NULL)

`d`	data frame
`colreal`	name of column in d which contains the real class
`colpred`	name of column in d which contains the predicted class
`opts`	a list from which we use the elements: `gainmat`: the gain matrix for each possible outcome, same size as `cm$mat` (see below). `gainmat[R1,P2]` is the gain associated with a record of real class R1 which we predict as class P2. (gain matrix = - cost matrix) `rgain.type`: one out of {"rgain" \| "meanCA" \| "minCA" \| "bYouden" \| "arROC" \| "arLIFT" \| "arPRE" }, affects output `cm$mat` and `cm$rgain`, see below.
`predProb`	if not NULL, a data frame with as many rows as data frame `d`, containing columns (index, true label, predicted label, prediction score). Is only needed for `opts$rgain.type=="ar*"`.

cm, a list containing:

`mat`	matrix with real class levels as rows, predicted class levels columns. `mat[R1,P2]` is the number of records with real class R1 predicted as class P2, if opts$rgain.type=="rgain". If opts$rgain.type=="meanCA" or "minCA", then show this number as percentage of "records with real class R1" (percentage of each row). CAUTION: If there are NA's in column `colpred`, those cases are missing in `mat` (!) (but the class errors are correct as long as there are no NA's in column `colreal`)
`cerr`	class error rates, vector of size nlevels(colreal)+1. `cerr[X]` is the misclassification rate for real class X. `cerr["Total"]` is the total classification error rate.
`gain`	the total gain (sum of pointwise product `opts$gainmat*cm$mat`)
`gain.vector`	gain.vector[X] is the gain attributed to real class label X. gain.vector["Total"] is again the total gain.
`gainmax`	the maximum achievable gain, assuming perfect prediction
`rgain`	Depending on the value of `opts$rgain.type`: `"rgain"`: ratio gain/gainmax in percent, `"meanCA"`: mean class accuracy percentage (i.e. mean(diag(cm$mat)), `"minCA"`: min class accuracy percentage (i.e. min(diag(cm$mat)), `"bYouden"`: balanced Youden index: min(sensitivity,specificity), `"arROC"`: area under ROC curve (a number in [0,1]), `"arLIFT"`: area between lift curve and horizontal line 1.0, `"arPRE"`: area under precision-recall curve (a number in [0,1])

For all measures rgain holds: The higher, the better.
The last four elements of opts$rgain.type= "bYouden","arROC", "arLIFT","arPre" are only available for binary classification.
For case "bYouden":
sensitivity = TP/(TP+FN)
specificity = TN/(TN+FP)