OneR: One Rule function

Description Usage Arguments Details Value Methods (by class) Author(s) References See Also Examples

View source: R/OneR_main.R

Description

Builds a model according to the One Rule (OneR) machine learning classification algorithm.

Usage

1
2
3
4
5
6
7
8
9
OneR(x, ...)

## S3 method for class 'formula'
OneR(formula, data, ties.method = c("first", "chisq"),
  verbose = FALSE, ...)

## S3 method for class 'data.frame'
OneR(x, ties.method = c("first", "chisq"),
  verbose = FALSE, ...)

Arguments

x

data frame with the last column containing the target variable.

...

arguments passed to or from other methods.

formula

formula, additionally the argument data is needed.

data

data frame which contains the data, only needed when using the formula interface.

ties.method

character string specifying how ties are treated, see 'Details'; can be abbreviated.

verbose

if TRUE prints rank, names and predictive accuracy of the attributes in decreasing order (with ties.method = "first").

Details

All numerical data is automatically converted into five categorical bins of equal length. Instances with missing values are removed. This is done by internally calling the default version of bin before starting the OneR algorithm. To finetune this behaviour data preprocessing with the bin or optbin functions should be performed. If data contains unused factor levels (e.g. due to subsetting) these are ignored and a warning is given.

When there is more than one attribute with best performance either the first (from left to right) is being chosen (method "first") or the one with the lowest p-value of a chi-squared test (method "chisq").

Value

Returns an object of class "OneR". Internally this is a list consisting of the function call with the specified arguments, the names of the target and feature variables, a list of the rules, the number of correctly classified and total instances and the contingency table of the best predictor vs. the target variable.

Methods (by class)

Author(s)

Holger von Jouanne-Diedrich

References

https://github.com/vonjd/OneR

See Also

bin, optbin, eval_model, maxlevels

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data <- optbin(iris)
model <- OneR(data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

## The same with the formula interface:
data <- optbin(iris)
model <- OneR(Species ~., data = data, verbose = TRUE)
summary(model)
plot(model)
prediction <- predict(model, data)
eval_model(prediction, data)

Example output

    Attribute    Accuracy
1 * Petal.Width  96%     
2   Petal.Length 95.33%  
3   Sepal.Length 74.67%  
4   Sepal.Width  55.33%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


Call:
OneR.data.frame(x = data, verbose = TRUE)

Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63]   then Species = versicolor
If Petal.Width = (1.63,2.5]     then Species = virginica

Accuracy:
144 of 150 instances classified correctly (96%)

Contingency table:
            Petal.Width
Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
  setosa               * 50            0          0  50
  versicolor              0         * 48          2  50
  virginica               0            4       * 46  50
  Sum                    50           52         48 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16


Confusion matrix (absolute):
            Actual
Prediction   setosa versicolor virginica Sum
  setosa         50          0         0  50
  versicolor      0         48         4  52
  virginica       0          2        46  48
  Sum            50         50        50 150

Confusion matrix (relative):
            Actual
Prediction   setosa versicolor virginica  Sum
  setosa       0.33       0.00      0.00 0.33
  versicolor   0.00       0.32      0.03 0.35
  virginica    0.00       0.01      0.31 0.32
  Sum          0.33       0.33      0.33 1.00

Accuracy:
0.96 (144/150)

Error rate:
0.04 (6/150)

Error rate reduction (vs. base rate):
0.94 (p-value < 2.2e-16)


    Attribute    Accuracy
1 * Petal.Width  96%     
2   Petal.Length 95.33%  
3   Sepal.Length 74.67%  
4   Sepal.Width  55.33%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


Call:
OneR.formula(formula = Species ~ ., data = data, verbose = TRUE)

Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63]   then Species = versicolor
If Petal.Width = (1.63,2.5]     then Species = virginica

Accuracy:
144 of 150 instances classified correctly (96%)

Contingency table:
            Petal.Width
Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
  setosa               * 50            0          0  50
  versicolor              0         * 48          2  50
  virginica               0            4       * 46  50
  Sum                    50           52         48 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16


Confusion matrix (absolute):
            Actual
Prediction   setosa versicolor virginica Sum
  setosa         50          0         0  50
  versicolor      0         48         4  52
  virginica       0          2        46  48
  Sum            50         50        50 150

Confusion matrix (relative):
            Actual
Prediction   setosa versicolor virginica  Sum
  setosa       0.33       0.00      0.00 0.33
  versicolor   0.00       0.32      0.03 0.35
  virginica    0.00       0.01      0.31 0.32
  Sum          0.33       0.33      0.33 1.00

Accuracy:
0.96 (144/150)

Error rate:
0.04 (6/150)

Error rate reduction (vs. base rate):
0.94 (p-value < 2.2e-16)

OneR documentation built on May 30, 2017, 12:35 a.m.