wracog: Wrapper for rapidly converging Gibbs algorithm.

Description Usage Arguments Details Value References Examples

View source: R/racog.R

Description

Generates synthetic minority examples by approximating their probability distribution until sensitivity of wrapper over validation cannot be further improved. Works only on discrete numeric datasets.

Usage

1
2
wracog(train, validation, wrapper, slideWin = 10, threshold = 0.02,
  classAttr = "Class", ...)

Arguments

train

data.frame. A initial dataset to generate first model. All columns, except classAttr one, have to be numeric or coercible to numeric.

validation

data.frame. A dataset to compare results of consecutive classifiers. Must have the same structure of train.

wrapper

An S3 object. There must exist a method trainWrapper implemented for the class of the object, and a predict method implemented for the class of the model returned by trainWrapper. Alternatively, it can the name of one of the wrappers distributed with the package, "KNN" or "C5.0".

slideWin

Number of last sensitivities to take into account to meet the stopping criteria. By default, 10.

threshold

Threshold that the last slideWin sensitivities mean should reach. By default, 0.02.

classAttr

character. Indicates the class attribute from train and validation. Must exist in them.

...

further arguments for wrapper.

Details

Until the last slideWin executions of wrapper over validation dataset reach a mean sensitivity lower than threshold, the algorithm keeps generating samples using Gibbs Sampler, and adding misclassified samples with respect to a model generated by a former train, to the train dataset. Initial model is built on initial train.

Value

A data.frame with the same structure as train, containing the generated synthetic examples.

References

Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222<e2><80><93>234.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
data(haberman)

# Create train and validation partitions of haberman
trainFold <- sample(1:nrow(haberman), nrow(haberman)/2, FALSE)
trainSet <- haberman[trainFold, ]
validationSet <- haberman[-trainFold, ]

# Defines our own wrapper with a C5.0 tree
myWrapper <- structure(list(), class="TestWrapper")
trainWrapper.TestWrapper <- function(wrapper, train, trainClass){
  C50::C5.0(train, trainClass)
}

# Execute wRACOG with our own wrapper
newSamples <- wracog(trainSet, validationSet, myWrapper,
                     classAttr = "Class")


# Execute wRACOG with predifined wrappers for "KNN" or "C5.0"
KNNSamples <- wracog(trainSet, validationSet, "KNN")
C50Samples <- wracog(trainSet, validationSet, "C5.0")

ncordon/imbalance documentation built on Feb. 19, 2018, 7:08 a.m.