# ImbR: Synthetic Regression Data Set In UBL: An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks

## Description

Simulated data set for imbalanced domain on regression. The rare cases corresponden to the higher extreme values and are described by a circle with white noise. The normal cases have a normal distribution with the same center of the circunference with elliptical contours.

## Usage

 1 data(ImbR) 

## Format

The data set has 2 continuous features (X1 and X2) and a continuous target variable (denoted as Tgt). The rare examples, i.e, cases with higher values of the target variable occur in 5% of the data. Data set ImbR has 1000 examples.

ImbR data has been simulated as follows:

-

lower Tgt values: (X1, X2)\sim \mathbf{N}_{2} ≤ft(\mathbf{10}_{2}, \mathbf{2.5}_{2}\right)

and Tgt\sim \mathbf{Γ} ≤ft( 0.5, 1 \right) +10

-

higher Tgt values: (X1, X2)\sim ≤ft(ρ * cos(θ) + 10, ρ * sin(θ) + 10 \right), where ρ \sim \mathbf{9}_{2}+\mathbf{N}_{2} ≤ft(\mathbf{0}_{2}, \mathbf{I}_{2} \right) and θ \sim \mathbf{U}_{2} ≤ft( \mathbf{0}_{2}, 2π \mathbf{I}_{2} \right) Tgt\sim \mathbf{Γ} ≤ft( 1,1 \right) + 20

## Author(s)

Paula Branco [email protected], Rita Ribeiro [email protected] and Luis Torgo [email protected]

## Examples

 1 2 3 4 data(ImbR) summary(ImbR) boxplot(ImbR\$Tgt) 

UBL documentation built on July 13, 2017, 5:02 p.m.