ImpSampClassif: Importance Sampling algorithm for imbalanced classification...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/ImpSampClassif.R

Description

This function handles imbalanced classification problems using the importance/relevance provided to re-sample the data set. The relevance is used to introduce replicas of the most important examples and to remove the least important examples. This function combines random over-sampling with random under-sampling which are applied in the problem classes according to the corresponding relevance.

Usage

1
ImpSampClassif(form, dat, C.perc = "balance")

Arguments

form

A formula describing the prediction problem

dat

A data frame containing the original (unbalanced) data set

C.perc

A list containing the percentage(s) of random under- or over-sampling to apply to each class. The over-sampling percentage is a number above 1 while the under-sampling percentage should be a number below 1. If the number 1 is provided for a given class then that class remains unchanged. Alternatively it may be "balance" (the default) or "extreme", cases where the sampling percentages are automatically estimated.

Value

The function returns a data frame with the new data set resulting from the application of the importance sampling strategy.

Author(s)

Paula Branco [email protected], Rita Ribeiro [email protected] and Luis Torgo [email protected]

See Also

RandUnderClassif, RandOverClassif

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  data(iris)
  # generating an artificially imbalanced data set
  ir <- iris[-c(51:70,111:150), ]
  IS.ext <-ImpSampClassif(Species~., ir, C.perc = "extreme")
  IS.bal <-ImpSampClassif(Species~., ir, C.perc = "balance")
  myIS <-ImpSampClassif(Species~., ir, C.perc = list(setosa = 0.2,
                                                    versicolor = 2,
                                                    virginica = 6))
  # check the results
  table(ir$Species)
  table(IS.ext$Species)
  table(IS.bal$Species)
  table(myIS$Species)

Example output

Loading required package: MBA
Loading required package: gstat
Loading required package: automap
Loading required package: sp
Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.

    setosa versicolor  virginica 
        50         30         10 

    setosa versicolor  virginica 
        12         20         59 

    setosa versicolor  virginica 
        30         30         30 

    setosa versicolor  virginica 
        10         60         60 

UBL documentation built on July 13, 2017, 5:02 p.m.