tdmModConfmat: Calculate confusion matrix, gain and RGain measure.

Description Usage Arguments Value Note Author(s) See Also

View source: R/tdmModelingUtils.r

Description

Calculate confusion matrix, gain and RGain measure.

Usage

1
tdmModConfmat(d, colreal, colpred, opts, predProb = NULL)

Arguments

d

data frame

colreal

name of column in d which contains the real class

colpred

name of column in d which contains the predicted class

opts

a list from which we use the elements:

  • gainmat: the gain matrix for each possible outcome, same size as cm$mat (see below).
    gainmat[R1,P2] is the gain associated with a record of real class R1 which we predict as class P2. (gain matrix = - cost matrix)

  • rgain.type: one out of {"rgain" | "meanCA" | "minCA" | "bYouden" | "arROC" | "arLIFT" | "arPRE" }, affects output cm$mat and cm$rgain, see below.

predProb

if not NULL, a data frame with as many rows as data frame d, containing columns (index, true label, predicted label, prediction score). Is only needed for opts$rgain.type=="ar*".

Value

cm, a list containing:

mat

matrix with real class levels as rows, predicted class levels columns.
mat[R1,P2] is the number of records with real class R1 predicted as class P2, if opts$rgain.type=="rgain". If opts$rgain.type=="meanCA" or "minCA", then show this number as percentage of "records with real class R1" (percentage of each row). CAUTION: If there are NA's in column colpred, those cases are missing in mat (!) (but the class errors are correct as long as there are no NA's in column colreal)

cerr

class error rates, vector of size nlevels(colreal)+1.
cerr[X] is the misclassification rate for real class X.
cerr["Total"] is the total classification error rate.

gain

the total gain (sum of pointwise product opts$gainmat*cm$mat)

gain.vector

gain.vector[X] is the gain attributed to real class label X. gain.vector["Total"] is again the total gain.

gainmax

the maximum achievable gain, assuming perfect prediction

rgain

Depending on the value of opts$rgain.type:
"rgain": ratio gain/gainmax in percent,
"meanCA": mean class accuracy percentage (i.e. mean(diag(cm$mat)),
"minCA": min class accuracy percentage (i.e. min(diag(cm$mat)),
"bYouden": balanced Youden index: min(sensitivity,specificity),
"arROC": area under ROC curve (a number in [0,1]),
"arLIFT": area between lift curve and horizontal line 1.0,
"arPRE": area under precision-recall curve (a number in [0,1])

Note

For all measures rgain holds: The higher, the better.
The last four elements of opts$rgain.type= "bYouden","arROC", "arLIFT","arPre" are only available for binary classification.
For case "bYouden":
sensitivity = TP/(TP+FN)
specificity = TN/(TN+FP)

Author(s)

Wolfgang Konen (wolfgang.konen@th-koeln.de), Patrick Koch

See Also

tdmClassify tdmROCRbase


TDMR documentation built on March 3, 2020, 1:06 a.m.