Description Usage Arguments Details Value References See Also Examples
Automated cross validation of hyperSMURF (hyper-ensemble SMote Undersampled Random Forests)
1 2 3 |
data |
a data frame or matrix with the data |
y |
a factor with the labels. 0:majority class, 1: minority class. |
kk |
number of folds (def: 5) |
n.part |
number of partitions (def. 10) |
fp |
multiplicative factor for the SMOTE oversampling of the minority class If fp<1 no oversampling is performed. |
ratio |
ratio of the #majority/#minority |
k |
number of the nearest neighbours for SMOTE oversampling (def. 5) |
ntree |
number of trees of the base learner random forest (def. 10) |
mtry |
number of the features to randomly selected by the decision tree of each base random forest (def. 5) |
cutoff |
a numeric vector of length 2. Cutoff for respectively the majority and minority class.
This parameter is meaningful when used with the thresholded version of hyperSMURF parameter ( |
thresh |
logical. If TRUE the thresholded version of hyperSMURF is executed (def: FALSE) |
seed |
initialization seed for the random generator. If set to 0(def.) no initialization is performed |
fold.partition |
vector of size nrow(data) with values in interval [0,kk). The values indicate the fold of the cross validation of each example. If NULL (default) the folds are randomly generated. |
file |
name of the file where the cross-validated hyperSMURF models will be saved. If file=="" (def.) no model is saved. |
The cross-validation is performed by randomly constructing the folds (parameter fold.partition
= NULL) or using a set of predefined folds listed in the parameter fold.partition
. The cross validation is performed by training and testing in sequence the base random forests. More precisely for each training set constructed at each step of the cross validation a separated random forest is trained sequentially for each of the n.part
partitions of the data, by oversampling the minority class (parameter fp
) and undersampling the majority class (parameter ratio
). The random forest parameters ntree
and mtry
are the same for all the random forest of the hyper-ensemble.
a vector with the cross-validated hyperSMURF probabilities (hyperSMURF scores).
M. Schubach, M. Re, P.N. Robinson and G. Valentini Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants, Scientific Reports, Nature Publishing, 7:2959, 2017.
hyperSMURF.train
, hyperSMURF.test
1 2 3 | d <- imbalanced.data.generator(n.pos=10, n.neg=300, sd=0.3);
res<-hyperSMURF.cv (d$data, d$labels, kk=2, n.part=3, fp=1, ratio=1, k=3, ntree=7,
mtry=2, seed = 1, fold.partition=NULL);
|
Creating new folds
Starting training on Fold 1 ...
Training of ensemble 1 done.
Training of ensemble 2 done.
Training of ensemble 3 done.
Starting test on Fold 1 ...
End test on Fold 1 .
Fold 1 done -----
Starting training on Fold 2 ...
Training of ensemble 1 done.
Training of ensemble 2 done.
Training of ensemble 3 done.
Starting test on Fold 2 ...
End test on Fold 2 .
Fold 2 done -----
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.