Description Usage Arguments Details Value References See Also Examples
A hyperSMURF model is trained on a given data set. Training data are partitioned, and each RF is separately trained on each partition by SMOTE oversampling of the positives (minority class examples) and undersampling of the negatives (majority class examples). Each RF is trained sequentially
1 2 | hyperSMURF.train(data, y, n.part = 10, fp = 1, ratio = 1, k = 5, ntree = 10,
mtry = 5, cutoff = c(0.5, 0.5), seed = 0, file = "")
|
data |
a data frame or matrix with the train data. Rows: examples; columns: features |
y |
a factor with the labels. 0:majority class, 1: minority class. |
n.part |
number of partitions (def. 10) |
fp |
multiplicative factor for the SMOTE oversampling of the minority class. If fp<1 no oversampling is performed. |
ratio |
ratio of the #majority/#minority |
k |
number of the nearest neighbours for SMOTE oversampling (def. 5) |
ntree |
number of trees of the base learner random forest (def. 10) |
mtry |
number of the features to randomly selected by the decision tree of each base random forest (def.5) |
cutoff |
a numeric vector of length 2. Cutoff for respectively the majority and minority class.
This parameter is meaningful when used with the thresholded version of hyperSMURF (parameter |
seed |
initialization seed for the random generator. If set to 0(def.) no initialization is performed |
file |
name of the file where the cross-validated hyperSMURF models will be saved. If file=="" (def.) no model is saved. |
A different random forest is trained on each partition of the training set.
If npos
and nneg
are the the number of respectively the positive and negative examples, for each partition of the training data fp*npos
new synthetic positives constructed by the SMOTE algorithm are added to the training set. The number of negatives is set to ratio*(fp*npos + npos)
. If no enough negatives are available in the partition, then all the negatives in the partition are used to train the base RF associated to the partition.
A list of trained RF models. Each element of the list is a randomForest
objects of the homonymous package.
M. Schubach, M. Re, P.N. Robinson and G. Valentini Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants, Scientific Reports, Nature Publishing, 7:2959, 2017.
1 2 3 | train <- imbalanced.data.generator(n.pos=20, n.neg=1000,
n.features=10, n.inf.features=2, sd=1, seed=1);
HSmodel <- hyperSMURF.train(train$data, train$label, n.part = 5, fp = 1, ratio = 2);
|
Training of ensemble 1 done.
Training of ensemble 2 done.
Training of ensemble 3 done.
Training of ensemble 4 done.
Training of ensemble 5 done.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.