# ImpSampClassif: Importance Sampling algorithm for imbalanced classification

## Description

This function handles imbalanced classification problems using the importance/relevance provided to re-sample the data set. The relevance is used to introduce replicas of the most important examples and to remove the least important examples. This function combines random over-sampling with random under-sampling which are applied in the problem classes according to the corresponding relevance.

## Usage

 `1` ```ImpSampClassif(form, dat, C.perc = "balance") ```

## Arguments

 `form` A formula describing the prediction problem `dat` A data frame containing the original (unbalanced) data set `C.perc` A list containing the percentage(s) of random under- or over-sampling to apply to each class. The over-sampling percentage is a number above 1 while the under-sampling percentage should be a number below 1. If the number 1 is provided for a given class then that class remains unchanged. Alternatively it may be "balance" (the default) or "extreme", cases where the sampling percentages are automatically estimated.

## Value

The function returns a data frame with the new data set resulting from the application of the importance sampling strategy.

## Author(s)

Paula Branco [email protected], Rita Ribeiro [email protected] and Luis Torgo [email protected]

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ``` data(iris) # generating an artificially imbalanced data set ir <- iris[-c(51:70,111:150), ] IS.ext <-ImpSampClassif(Species~., ir, C.perc = "extreme") IS.bal <-ImpSampClassif(Species~., ir, C.perc = "balance") myIS <-ImpSampClassif(Species~., ir, C.perc = list(setosa = 0.2, versicolor = 2, virginica = 6)) # check the results table(ir\$Species) table(IS.ext\$Species) table(IS.bal\$Species) table(myIS\$Species) ```

### Example output

setosa versicolor  virginica
50         30         10

setosa versicolor  virginica
12         20         59

setosa versicolor  virginica
30         30         30

setosa versicolor  virginica
10         60         60
```

