Supervised discretization based on the maximum discernibility heuristic

Description

It is a function used for computing globally semi-optimal cuts using the maximum discernibility heuristic.

Usage

1
2
3
D.global.discernibility.heuristic.RST(decision.table, maxNOfCuts = 2 *
  ncol(decision.table), attrSampleSize = ncol(decision.table) - 1,
  cutCandidatesList = NULL, discFunction = global.discernibility, ...)

Arguments

decision.table

an object inheriting from the "DecisionTable" class, which represents a decision system. See SF.asDecisionTable. It should be noted that for this particular method all conditional attributes must be numeric.

maxNOfCuts

a positive integer indicating the maximum number of allowed cuts.

attrSampleSize

an integer between 1 and the number of conditional attributes (the default). It indicates the attribute sample size for the Monte Carlo selection of candidating cuts.

cutCandidatesList

an optional list containing candidates for optimal cut values. By default the candidating cuts are determined automatically.

discFunction

a function used for computation of cuts. Currently only one implementation of maximu discernibility heuristic is available (the default). However, this parameter can be used to integrate custom implementations of discretization functions with the RoughSets package.

...

additional parameters to the discFunction (currently unsupported).

Details

A complete description of the implemented algorithm can be found in (Nguyen, 2001).

It should be noted that the output of this function is an object of a class "Discretization" which contains the cut values. The function SF.applyDecTable has to be used in order to generate the new (discretized) decision table.

Value

An object of a class "Discretization" which stores cuts for each conditional attribute. See D.discretization.RST.

Author(s)

Andrzej Janusz

References

S. H. Nguyen, "On Efficient Handling of Continuous Attributes in Large Data Bases", Fundamenta Informaticae, vol. 48, p. 61 - 81 (2001).

Jan G. Bazan, Hung Son Nguyen, Sinh Hoa Nguyen, Piotr Synak, and Jakub Wroblewski, "Rough Set Algorithms in Classification Problem", Chapter 2 In: L. Polkowski, S. Tsumoto and T.Y. Lin (eds.): Rough Set Methods and Applications Physica-Verlag, Heidelberg, New York, p. 49 - 88 ( 2000).

See Also

D.discretize.quantiles.RST, D.discretize.equal.intervals.RST, D.local.discernibility.heuristic.RST and SF.applyDecTable. A wrapper function for all available discretization methods: D.discretization.RST

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#################################################################
## Example: Determine cut values and generate new decision table
#################################################################
data(RoughSetData)
wine.data <- RoughSetData$wine.dt
cut.values <- D.global.discernibility.heuristic.RST(wine.data)

## generate a new decision table:
wine.discretized <- SF.applyDecTable(wine.data, cut.values)
dim(wine.discretized)
lapply(wine.discretized, unique)

## remove attributes with only one possible value:
to.rm.idx <- which(sapply(lapply(wine.discretized, unique), function(x) length(x) == 1))
to.rm.idx
wine.discretized.reduced <- wine.discretized[-to.rm.idx]
dim(wine.discretized.reduced)

## check whether the attributes in the reduced data are a super-reduct of the original data:
colnames(wine.discretized.reduced)
class.idx <- which(colnames(wine.discretized.reduced) == "class")
sum(duplicated(wine.discretized.reduced)) == sum(duplicated(wine.discretized.reduced[-class.idx]))
## yes it is

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.