| hutch_sampling | R Documentation | 
hutch_sampling returns the balanced document term matrix using Randon Undersampling and Random Oversampling techniques
hutch_sampling(X, Y, type = "ROS", perc = 50, k_pos = 0, w = NULL, verbose = TRUE)
| X | represents a DTM with the class c('DocumentTermMatrix', 'simple_triplet_matrix') | 
| Y | the response variable of the unbalanced dataset, should be a factor with two levels (binary) | 
| type | technique for balancing i.e. either Random Oversampling (ROS) or two different ways of applying Random Undersampling (RUS) i.e "RUS_under" type to apply percentage of undersampling according to the majority class("percUnder") or RUS_Pos minority class("percPos") | 
| perc | argument for the type "RUS_under" and "RUS_Pos" only i.e. percentage of sampling of majority class depending on the type of RUS i.e (RUS_under, RUS_Pos) | 
| k_pos | argument for the type "ROS" only, number of times of positve (minority) instances to be generated | 
| w | argument for the type "RUS_under" and "RUS_Pos only", undersampling with weighting of majotity class, if NULL sampling is done by giving qual weights | 
| verbose | argument only for the type = "ROS" only. If TRUE, prints extra information | 
This function applies balancing techniques: Random Undersampling and Random Oversampling on the document term matrix
using the functions ubUnder and ubOver.
if type = "RUS_Pos" or "RUS_under", value will be a list of 3 elements. The first element X will be the balanced DTM of the same class as the input DTM i.e c('DocumentTermMatrix', 'simple_triplet_matrix'), the second element Y will contain the response variable of the balanced data as factors and the third element will contain a vector representing the removed documents.
if type = "ROS", value will be a list of two elements. The first element X will be the balanced DTM of the same class as the input DTM i.e c('DocumentTermMatrix', 'simple_triplet_matrix'), the second element Y will contain the response variable of the balanced data as factors
library(tm) library(unbalanced) y <- factor(meta(liu_corpus)$real_label) x <- liu_dtm exp <- hutch_sampling(x, y, type = "RUS_Pos", perc = 50, k_pos = 0, w = NULL, verbose = FALSE ) exp$X exp$Y exp$id.rm test <- hutch_sampling(x, y, type = "ROS", perc = 50, k_pos = 2, w = NULL, verbose = FALSE ) test$X test$Y
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.