breastcancer: Breast Cancer Wisconsin Original Data Set
In OneR: One Rule Machine Learning Classification Algorithm with Enhancements

Description Usage Format Details References Examples

Dataset containing the original Wisconsin breast cancer data.

1	data(breastcancer)

A data frame with 699 instances and 10 attributes. The variables are as follows:

Clump Thickness: 1 - 10
Uniformity of Cell Size: 1 - 10
Uniformity of Cell Shape: 1 - 10
Marginal Adhesion: 1 - 10
Single Epithelial Cell Size: 1 - 10
Bare Nuclei: 1 - 10
Bland Chromatin: 1 - 10
Normal Nucleoli: 1 - 10
Mitoses: 1 - 10
Class: benign, malignant

The data were obtained from the UCI machine learning repository, see https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

data(breastcancer)
data <- optbin(breastcancer, method = "infogain")
model <- OneR(data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

Warning message:
In optbin.data.frame(breastcancer, method = "infogain") :
  16 instance(s) removed due to missing values

    Attribute                   Accuracy
1 * Uniformity of Cell Size     92.68%  
2   Uniformity of Cell Shape    91.51%  
3   Bare Nuclei                 91.22%  
4   Bland Chromatin             90.78%  
5   Single Epithelial Cell Size 90.04%  
6   Normal Nucleoli             89.75%  
7   Marginal Adhesion           86.68%  
8   Clump Thickness             85.51%  
9   Mitoses                     78.77%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


Call:
OneR.data.frame(x = data, verbose = TRUE)

Rules:
If Uniformity of Cell Size = (0.991,2] then Class = benign
If Uniformity of Cell Size = (2,10]    then Class = malignant

Accuracy:
633 of 683 instances classified correctly (92.68%)

Contingency table:
           Uniformity of Cell Size
Class       (0.991,2] (2,10] Sum
  benign        * 406     38 444
  malignant        12  * 227 239
  Sum             418    265 683
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 485.03, df = 1, p-value < 2.2e-16


Confusion matrix (absolute):
           Actual
Prediction  benign malignant Sum
  benign       406        12 418
  malignant     38       227 265
  Sum          444       239 683

Confusion matrix (relative):
           Actual
Prediction  benign malignant  Sum
  benign      0.59      0.02 0.61
  malignant   0.06      0.33 0.39
  Sum         0.65      0.35 1.00

Accuracy:
0.9268 (633/683)

Error rate:
0.0732 (50/683)

Error rate reduction (vs. base rate):
0.7908 (p-value < 2.2e-16)