The package provides a diversity of pre-processing functions to deal with both classification (binary and multi-class) and regression problems that encompass non-uniform costs and/or benefits. These functions can be used to obtain a better predictive performance on this type of tasks. The package also includes two synthetic data sets for regression and classification.
|License:||GPL 2 GPL 3|
The package in focused on utility-based learning, i.e., classification and regression problems with non-uniform benefits and/or costs. The main goal of the implemented functions is to improve the predictive performance of the models obtained. The package provides pre-processing approaches that change the original data set biasing it towards the user preferences.
All the methods avaliable are suitable for classification (binary and multiclass) and regression tasks. Moreover, several distance functions are also implemented which allows the use of the methods in data sets with categorical and/or numeric features.
We also provide two synthetic data sets for classification and regression.
Maintainer: Paula Branco
Branco, P., Ribeiro, R.P. and Torgo, L. (2016) UBL: an R package for Utility-based Learning. arXiv preprint arXiv:1604.08079.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## Not run: library(UBL) # an example with the synthetic classification data set provided with the package data(ImbC) plot(ImbC$X1, ImbC$X2, col = ImbC$Class, xlab = "X1", ylab = "X2") summary(ImbC) # randomly generate a 30-70% test and train partition i.train <- sample(1:nrow(ImbC), as.integer(0.7*nrow(ImbC))) trainD <- ImbC[i.train,] testD <- ImbC[-i.train,] model <- rpart(Class~., trainD) preds <- predict(model, testD, type = "class") table(preds, testD$Class) # apply random over-sampling approach to balance the data set: newTrain <- RandOverClassif(Class~., trainD) newModel <- rpart(Class~., newTrain) newPreds <- predict(newModel, testD, type = "class") table(newPreds, testD$Class) # an example with the synthetic regression data set provided with the package data(ImbR) library(ggplot2) ggplot(ImbR, aes(x = X1, y = X2)) + geom_point(data = ImbR, aes(colour=Tgt)) + scale_color_gradient(low = "red", high="blue") boxplot(ImbR$Tgt) #relevance function automatically obtained phiF.args <- phi.control(ImbR$Tgt, method = "extremes", extr.type = "high") y.phi <- phi(sort(ImbR$Tgt),control.parms = phiF.args) plot(sort(ImbR$Tgt), y.phi, type = "l", xlab = "Tgt variable", ylab = "relevance value") # set the train and test data i.train <- sample(1:nrow(ImbR), as.integer(0.7*nrow(ImbR))) trainD <- ImbR[i.train,] testD <- ImbR[-i.train,] # train a model on the original train data library(DMwR) model <- rpartXse(Tgt~., trainD, se = 0) preds <- predict(model, testD) plot(testD$Tgt, preds, xlim = c(0,55), ylim = c(0,55)) abline(a = 0, b = 1) # obtain a new train using random under-sampling strategy newTrain <- RandUnderRegress(Tgt~., trainD) newModel <- rpartXse(Tgt~., newTrain, se = 0) newPreds <- predict(newModel, testD) # plot the predictions for the model obtained with # the original and the modified train data plot(testD$Tgt, preds, xlim = c(0,55), ylim = c(0,55)) #black for original train abline(a = 0, b = 1, lty=2, col="grey") points(testD$Tgt, newPreds, col="blue", pch=2) #blue for changed train abline(h=30, lty=2, col="grey") abline(v=30, lty=2, col="grey") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.