evaluate_Weka_classifier: Model Statistics for R/Weka Classifiers

Description Usage Arguments Details Value References Examples

View source: R/evaluate.R

Description

Compute model performance statistics for a fitted Weka classifier.

Usage

1
2
3
evaluate_Weka_classifier(object, newdata = NULL, cost = NULL, 
                         numFolds = 0, complexity = FALSE,
                         class = FALSE, seed = NULL, ...)

Arguments

object

a Weka_classifier object.

newdata

an optional data frame in which to look for variables with which to evaluate. If omitted or NULL, the training instances are used.

cost

a square matrix of (mis)classification costs.

numFolds

the number of folds to use in cross-validation.

complexity

option to include entropy-based statistics.

class

option to include class statistics.

seed

optional seed for cross-validation.

...

further arguments passed to other methods (see details).

Details

The function computes and extracts a non-redundant set of performance statistics that is suitable for model interpretation. By default the statistics are computed on the training data.

Currently argument ... only supports the logical variable normalize which tells Weka to normalize the cost matrix so that the cost of a correct classification is zero.

Note that if the class variable is numeric only a subset of the statistics are available. Arguments complexity and class are then not applicable and therefore ignored.

Value

An object of class Weka_classifier_evaluation, a list of the following components:

string

character, concatenation of the string representations of the performance statistics.

details

vector, base statistics, e.g., the percentage of instances correctly classified, etc.

detailsComplexity

vector, entropy-based statistics (if selected).

detailsClass

matrix, class statistics, e.g., the true positive rate, etc., for each level of the response variable (if selected).

confusionMatrix

table, cross-classification of true and predicted classes.

References

I. H. Witten and E. Frank (2005). Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, San Francisco.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Use some example data.
w <- read.arff(system.file("arff","weather.nominal.arff", 
	       package = "RWeka"))

## Identify a decision tree.
m <- J48(play~., data = w)
m

## Use 10 fold cross-validation.
e <- evaluate_Weka_classifier(m,
                              cost = matrix(c(0,2,1,0), ncol = 2),
                              numFolds = 10, complexity = TRUE,
                              seed = 123, class = TRUE)
e
summary(e)
e$details

Example output

OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
J48 pruned tree
------------------

outlook = sunny
|   humidity = high: no (3.0)
|   humidity = normal: yes (2.0)
outlook = overcast: yes (4.0)
outlook = rainy
|   windy = FALSE: yes (3.0)
|   windy = TRUE: no (2.0)

Number of Leaves  : 	5

Size of the tree : 	8

=== 10 Fold Cross Validation ===

=== Summary ===

Correctly Classified Instances           5               35.7143 %
Incorrectly Classified Instances         9               64.2857 %
Kappa statistic                         -0.3404
Total Cost                              13     
Average Cost                             0.9286
K&B Relative Info Score               -310.2048 %
K&B Information Score                   -2.8486 bits     -0.2035 bits/instance
Class complexity | order 0              13.7612 bits      0.9829 bits/instance
Class complexity | scheme             4305.1699 bits    307.5121 bits/instance
Complexity improvement     (Sf)      -4291.4087 bits   -306.5292 bits/instance
Mean absolute error                      0.5417
Root mean squared error                  0.6854
Relative absolute error                113.75   %
Root relative squared error            138.9193 %
Total Number of Instances               14     

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.444    0.800    0.500      0.444    0.471      -0.344   0.478     0.703     yes
                 0.200    0.556    0.167      0.200    0.182      -0.344   0.478     0.360     no
Weighted Avg.    0.357    0.713    0.381      0.357    0.367      -0.344   0.478     0.581     

=== Cost Matrix ===

 0 1
 2 0

=== Confusion Matrix ===

 a b   <-- classified as
 4 5 | a = yes
 4 1 | b = no
                  Length Class  Mode     
string             1     -none- character
details            8     -none- numeric  
detailsCost        1     -none- numeric  
detailsComplexity  4     -none- numeric  
detailsClass      12     -none- numeric  
confusionMatrix    4     -none- numeric  
              pctCorrect             pctIncorrect          pctUnclassified 
              35.7142857               64.2857143                0.0000000 
                   kappa        meanAbsoluteError     rootMeanSquaredError 
              -0.3404255                0.5416667                0.6853773 
   relativeAbsoluteError rootRelativeSquaredError 
             113.7500000              138.9192548 
Warning message:
system call failed: Cannot allocate memory 

RWeka documentation built on Aug. 23, 2020, 5:07 p.m.