# confMat: Confusion Matrix In GMDH2: Binary Classification via GMDH-Type Neural Network Algorithms

## Description

confMat constructs a 2\times2 confusion matrix and returns some statistics related to confusion matrix.

## Usage

 1 2 3 4 5 6 7 confMat(data, ...) ## Default S3 method: confMat(data, reference, positive = NULL, verbose = TRUE, ...) ## S3 method for class 'table' confMat(data, positive = NULL, verbose = TRUE, ...) 

## Arguments

 data a factor of predicted classes (for the default method) or an object of class table. ... option to be passed to table. Note: do not include reference here. reference a factor of classes to be used as the true results. positive an optional character string for the factor level that corresponds to a "positive" result. verbose a logical for printing output to R console.

## Details

The confMat function requires that the factors have exactly the same levels. The function constructs 2\times2 confusion matrix and calculates accuracy, no information rate (NIR), unweighted Kappa statistic, Matthews correlation coefficient, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), prevalence, balanced accuracy, youden index, detection rate, detection prevalence, precision, recall and F1 measure.

Suppose a 2\times2 table with notation

 Reference Predicted Event No Event Event TP FP No Event FN TN

TP is the number of true positives, FP is the number of false positives, FN is the number of false negatives and TN is the number of true negatives.

Accuracy = \frac{TP + TN}{TP + FP + FN + TN}

NIR = max(Prevalence, 1 - Prevalence)

Kappa = \frac{Accuracy - \frac{(TP + FP)(TP + FN)+(FN + TN)(FP + TN)}{(TP + FP + FN + TN)^2}}{1 - \frac{(TP + FP)(TP + FN)+(FN + TN)(FP + TN)}{(TP + FP + FN + TN)^2}}

MCC = \frac{TP \times TN - FN \times FN}{√{(TP+FP) \times (FN+TN) \times (TP+FN) \times (FP+TN)}}

Sensitivity = \frac{TP}{TP+FN}

Specificity = \frac{TN}{TN+FP}

PPV = \frac{TP}{TP+FP}

NPV = \frac{TN}{TN+FN}

Prevalence = \frac{TP + FN}{TP + FP + FN + TN}

Balanced\ accuracy = \frac{Sensitivity + Specificity}{2}

Youden\ index = Sensitivity + Specificity -1

Detection\ rate = \frac{TP}{TP + FP + FN + TN}

Detection\ prevalence = \frac{TP+FP}{TP + FP + FN + TN}

Precision = \frac{TP}{TP + FP}

Recall = \frac{TP}{TP+FN}

F1 = \frac{2}{\frac{1}{Recall}+\frac{1}{Precision}}

## Value

Returns a list containing following elements:

 table confusion matrix accuracy accuracy NIR no information rate kappa unweighted kappa MCC Matthews correlation coefficient sensitivity sensitivity specificity specificity PPV positive predictive value NPV negative predictive value prevalence prevalence baccuracy balanced accuracy youden youden index detectRate detection rate detectPrev detection prevalence precision precision recall recall F1 F1 measure all returns a matrix containing all statistics

## Note

If the factors, reference and data, have the same levels, but in the incorrect order, the confMat will reorder data with the order of reference.

## Author(s)

Osman Dag

confusionMatrix
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 library(GMDH2) library(mlbench) data(BreastCancer) data <- BreastCancer # to obtain complete observations completeObs <- complete.cases(data) data <- data[completeObs,] x <- data.matrix(data[,2:10]) y <- data[,11] seed <- 12345 set.seed(seed) nobs <- length(y) # to split train, validation and test sets indices <- sample(1:nobs) ntrain <- round(nobs*0.6,0) nvalid <- round(nobs*0.2,0) ntest <- nobs-(ntrain+nvalid) train.indices <- sort(indices[1:ntrain]) valid.indices <- sort(indices[(ntrain+1):(ntrain+nvalid)]) test.indices <- sort(indices[(ntrain+nvalid+1):nobs]) x.train <- x[train.indices,] y.train <- y[train.indices] x.valid <- x[valid.indices,] y.valid <- y[valid.indices] x.test <- x[test.indices,] y.test <- y[test.indices] set.seed(seed) # to construct model via dce-GMDH algorithm model <- dceGMDH(x.train, y.train, x.valid, y.valid) # to obtain predicted classes for test set y.test_pred <- predict(model, x.test, type = "class") # to obtain confusion matrix and some statistics for test set confMat(y.test_pred, y.test, positive = "malignant") # to obtain statistics from table result <- table(y.test_pred, y.test) confMat(result, positive = "malignant")