OneR: One Rule function
In OneR: One Rule Machine Learning Classification Algorithm with Enhancements

Description Usage Arguments Details Value Methods (by class) Author(s) References See Also Examples

Builds a model according to the One Rule (OneR) machine learning classification algorithm.

OneR(x, ...)

## S3 method for class 'formula'
OneR(formula, data, ties.method = c("first", "chisq"),
  verbose = FALSE, ...)

## S3 method for class 'data.frame'
OneR(x, ties.method = c("first", "chisq"),
  verbose = FALSE, ...)

`x`	data frame with the last column containing the target variable.
`...`	arguments passed to or from other methods.
`formula`	formula, additionally the argument `data` is needed.
`data`	data frame which contains the data, only needed when using the formula interface.
`ties.method`	character string specifying how ties are treated, see 'Details'; can be abbreviated.
`verbose`	if `TRUE` prints rank, names and predictive accuracy of the attributes in decreasing order (with `ties.method = "first"`).

All numerical data is automatically converted into five categorical bins of equal length. Instances with missing values are removed. This is done by internally calling the default version of bin before starting the OneR algorithm. To finetune this behaviour data preprocessing with the bin or optbin functions should be performed. If data contains unused factor levels (e.g. due to subsetting) these are ignored and a warning is given.

When there is more than one attribute with best performance either the first (from left to right) is being chosen (method "first") or the one with the lowest p-value of a chi-squared test (method "chisq").

Returns an object of class "OneR". Internally this is a list consisting of the function call with the specified arguments, the names of the target and feature variables, a list of the rules, the number of correctly classified and total instances and the contingency table of the best predictor vs. the target variable.

formula: method for formulas.
data.frame: method for data frames.

Holger von Jouanne-Diedrich

https://github.com/vonjd/OneR

bin, optbin, eval_model, maxlevels

data <- optbin(iris)
model <- OneR(data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

## The same with the formula interface:
data <- optbin(iris)
model <- OneR(Species ~., data = data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

    Attribute    Accuracy
1 * Petal.Width  96%     
2   Petal.Length 95.33%  
3   Sepal.Length 74.67%  
4   Sepal.Width  55.33%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


Call:
OneR.data.frame(x = data, verbose = TRUE)

Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63]   then Species = versicolor
If Petal.Width = (1.63,2.5]     then Species = virginica

Accuracy:
144 of 150 instances classified correctly (96%)

Contingency table:
            Petal.Width
Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
  setosa               * 50            0          0  50
  versicolor              0         * 48          2  50
  virginica               0            4       * 46  50
  Sum                    50           52         48 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16


Confusion matrix (absolute):
            Actual
Prediction   setosa versicolor virginica Sum
  setosa         50          0         0  50
  versicolor      0         48         4  52
  virginica       0          2        46  48
  Sum            50         50        50 150

Confusion matrix (relative):
            Actual
Prediction   setosa versicolor virginica  Sum
  setosa       0.33       0.00      0.00 0.33
  versicolor   0.00       0.32      0.03 0.35
  virginica    0.00       0.01      0.31 0.32
  Sum          0.33       0.33      0.33 1.00

Accuracy:
0.96 (144/150)

Error rate:
0.04 (6/150)

Error rate reduction (vs. base rate):
0.94 (p-value < 2.2e-16)


    Attribute    Accuracy
1 * Petal.Width  96%     
2   Petal.Length 95.33%  
3   Sepal.Length 74.67%  
4   Sepal.Width  55.33%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


Call:
OneR.formula(formula = Species ~ ., data = data, verbose = TRUE)

Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63]   then Species = versicolor
If Petal.Width = (1.63,2.5]     then Species = virginica

Accuracy:
144 of 150 instances classified correctly (96%)

Contingency table:
            Petal.Width
Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
  setosa               * 50            0          0  50
  versicolor              0         * 48          2  50
  virginica               0            4       * 46  50
  Sum                    50           52         48 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16


Confusion matrix (absolute):
            Actual
Prediction   setosa versicolor virginica Sum
  setosa         50          0         0  50
  versicolor      0         48         4  52
  virginica       0          2        46  48
  Sum            50         50        50 150

Confusion matrix (relative):
            Actual
Prediction   setosa versicolor virginica  Sum
  setosa       0.33       0.00      0.00 0.33
  versicolor   0.00       0.32      0.03 0.35
  virginica    0.00       0.01      0.31 0.32
  Sum          0.33       0.33      0.33 1.00

Accuracy:
0.96 (144/150)

Error rate:
0.04 (6/150)

Error rate reduction (vs. base rate):
0.94 (p-value < 2.2e-16)