hutch_sampling | R Documentation |
hutch_sampling
returns the balanced document term matrix using Randon Undersampling and Random Oversampling techniques
hutch_sampling(X, Y, type = "ROS", perc = 50, k_pos = 0, w = NULL, verbose = TRUE)
X |
represents a DTM with the class c('DocumentTermMatrix', 'simple_triplet_matrix') |
Y |
the response variable of the unbalanced dataset, should be a factor with two levels (binary) |
type |
technique for balancing i.e. either Random Oversampling (ROS) or two different ways of applying Random Undersampling (RUS) i.e "RUS_under" type to apply percentage of undersampling according to the majority class("percUnder") or RUS_Pos minority class("percPos") |
perc |
argument for the type "RUS_under" and "RUS_Pos" only i.e. percentage of sampling of majority class depending on the type of RUS i.e (RUS_under, RUS_Pos) |
k_pos |
argument for the type "ROS" only, number of times of positve (minority) instances to be generated |
w |
argument for the type "RUS_under" and "RUS_Pos only", undersampling with weighting of majotity class, if NULL sampling is done by giving qual weights |
verbose |
argument only for the type = "ROS" only. If TRUE, prints extra information |
This function applies balancing techniques: Random Undersampling and Random Oversampling on the document term matrix
using the functions ubUnder
and ubOver
.
if type = "RUS_Pos" or "RUS_under", value will be a list of 3 elements. The first element X will be the balanced DTM of the same class as the input DTM i.e c('DocumentTermMatrix', 'simple_triplet_matrix'), the second element Y will contain the response variable of the balanced data as factors and the third element will contain a vector representing the removed documents.
if type = "ROS", value will be a list of two elements. The first element X will be the balanced DTM of the same class as the input DTM i.e c('DocumentTermMatrix', 'simple_triplet_matrix'), the second element Y will contain the response variable of the balanced data as factors
library(tm) library(unbalanced) y <- factor(meta(liu_corpus)$real_label) x <- liu_dtm exp <- hutch_sampling(x, y, type = "RUS_Pos", perc = 50, k_pos = 0, w = NULL, verbose = FALSE ) exp$X exp$Y exp$id.rm test <- hutch_sampling(x, y, type = "ROS", perc = 50, k_pos = 2, w = NULL, verbose = FALSE ) test$X test$Y
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.