Description Usage Format Author(s) Examples
Synthetic imbalanced data set for a multi-class task. The data set has a numeric feature ("X1"), a nominal feature ("X2") and a target class named "Class". The three classes of the problem ("normal", "rare1" and "rare2") are assigned according to the rules described below. These rules depend of the two features ("X1" and "X2").
1 |
The data set has one continuous feature (X1
) and one nominal feature (X2
). The target class (denoted as Class
) has three possible values ("normal" , "rare1" and "rare2"). Classes "rare1" and "rare2" are the minority classes. Examples of class "rare1" occur in 1% of the data while those of class "rare2" occur in 13.1% of the data. The remaining class, "normal", is the majority class and occurs in about 85.9% of the data. Data set ImbC has 1000 examples distributed in classes "rare1", "rare2" and "normal" with 10, 131 and 859 examples respectively.
ImbC data has been simulated as follows:
X1
\sim \mathbf{N} ≤ft(0, 4\right)
X2
labels "cat", "fish" and "dog" where randomly distributed with the restriction of having a frequency of 30%, 30% and 40% respectively.
To obtain the target variable Class
, we have define the following sets:
S_1=\{(X1, X2) : X1 > 9 \wedge (X2 \in \{"cat", "dog"\})\}
S_2=\{(X1, X2) : X1 > 7 \wedge X2 = "fish" \}
S_3=\{(X1, X2) :-1 < X1 < 0.5\}
S_4=\{(X1, X2) : X1 < -7 \wedge X2 = "fish"\}
The following conditions define the target variable distribution of the ImbC synthetic data set:
Assign class label "rare1" to: a random sample of 90% of set S_1 and a random sample of 40% of set S_2
Assign class label "rare2" to: a random sample of 80% of set S_3 and a random sample of 70% of set S_4
Assign class label "normal" to the remaing examples.
Paula Branco [email protected], Rita Ribeiro [email protected] and Luis Torgo [email protected]
1 2 3 4 |
Loading required package: MBA
Loading required package: gstat
Loading required package: automap
Loading required package: sp
Loading required package: randomForest
randomForest 4.6-12
Type rfNews() to see new features/changes/bug fixes.
Loading required package: ggplot2
Attaching package: 'ggplot2'
The following object is masked from 'package:randomForest':
margin
X1 X2 Class
Min. :-13.5843 cat :300 normal:859
1st Qu.: -2.6930 dog :400 rare1 : 10
Median : -0.1592 fish:300 rare2 :131
Mean : -0.1064
3rd Qu.: 2.4633
Max. : 12.7836
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.