discretizeDF.supervised: Supervised Methods to Convert Continuous Variables into...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/discretizeDF.supervised.R

Description

This function implements several supervised methods to convert continuous variables into a categorical variables (factor) suitable for association rule mining and building associative classifiers. A whole data.frame is discretized (i.e., all numeric columns are discretized).

Usage

1
discretizeDF.supervised(formula, data, method = "mdlp", dig.lab = 3, ...)

Arguments

formula

a formula object to specify the class variable for supervised discretization and the predictors to be discretized in the form class ~ . or class ~ predictor1 + predictor2.

data

a data.frame containing continuous variables to be discretized

method

discretization method. Available are: "mdlp", "caim", "cacc", "ameva", "chi2", "chimerge", "extendedchi2", and "modchi2".

dig.lab

integer; number of digits used to create labels.

...

Additional parameters are passed on to the implementation of the chosen discretization method.

Details

discretizeDF.supervised only implements supervised discretization. See discretizeDF in package arules for unsupervised discretization.

Value

discretizeDF returns a discretized data.frame. Discretized columns have an attribute "discretized:breaks" indicating the used breaks or and "discretized:method" giving the used method.

Author(s)

Michael Hahsler

See Also

Unsupervised discretization from arules: discretize, discretizeDF.

Details about the available supervised discretization methods from discretization: mdlp, caim, cacc, ameva, chi2, chiM, extendChi2, modChi2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data("iris")
summary(iris)

# supervised discretization using Species
iris.disc <- discretizeDF.supervised(Species ~ ., iris)
summary(iris.disc)

attributes(iris.disc$Sepal.Length)

# discretize the first few instances of iris using the same breaks as iris.disc
discretizeDF(head(iris), methods = iris.disc)

# only discretize predictors Sepal.Length and Petal.Length
iris.disc2 <- discretizeDF.supervised(Species ~ Sepal.Length + Petal.Length, iris)
head(iris.disc2)

Example output

Loading required package: Matrix
Loading required package: arules

Attaching package: 'arules'

The following objects are masked from 'package:base':

    abbreviate, write

Loading required package: discretization
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
      Sepal.Length      Sepal.Width      Petal.Length      Petal.Width
 [-Inf,5.55):59    [-Inf,2.95):57   [-Inf,2.45):50    [-Inf,0.8) :50  
 [5.55,6.15):36    [2.95,3.35):56   [2.45,4.75):45    [0.8,1.75) :54  
 [6.15, Inf]:55    [3.35, Inf]:37   [4.75, Inf]:55    [1.75, Inf]:46  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
$levels
[1] "[-Inf,5.55)" "[5.55,6.15)" "[6.15, Inf]"

$class
[1] "factor"

$`discretized:breaks`
[1] -Inf 5.55 6.15  Inf

$`discretized:method`
[1] "mdlp"

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa
2  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
3  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
4  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
5  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa
6  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1  [-Inf,5.55)         3.5  [-Inf,2.45)         0.2  setosa
2  [-Inf,5.55)         3.0  [-Inf,2.45)         0.2  setosa
3  [-Inf,5.55)         3.2  [-Inf,2.45)         0.2  setosa
4  [-Inf,5.55)         3.1  [-Inf,2.45)         0.2  setosa
5  [-Inf,5.55)         3.6  [-Inf,2.45)         0.2  setosa
6  [-Inf,5.55)         3.9  [-Inf,2.45)         0.4  setosa

arulesCBA documentation built on April 20, 2020, 5:06 p.m.