hyperSMURF.train: hyperSMURF training
In hyperSMURF: Hyper-Ensemble Smote Undersampled Random Forests

Description Usage Arguments Details Value References See Also Examples

A hyperSMURF model is trained on a given data set. Training data are partitioned, and each RF is separately trained on each partition by SMOTE oversampling of the positives (minority class examples) and undersampling of the negatives (majority class examples). Each RF is trained sequentially

1 2	hyperSMURF.train(data, y, n.part = 10, fp = 1, ratio = 1, k = 5, ntree = 10, mtry = 5, cutoff = c(0.5, 0.5), seed = 0, file = "")

`data`	a data frame or matrix with the train data. Rows: examples; columns: features
`y`	a factor with the labels. 0:majority class, 1: minority class.
`n.part`	number of partitions (def. 10)
`fp`	multiplicative factor for the SMOTE oversampling of the minority class. If fp<1 no oversampling is performed.
`ratio`	ratio of the #majority/#minority
`k`	number of the nearest neighbours for SMOTE oversampling (def. 5)
`ntree`	number of trees of the base learner random forest (def. 10)
`mtry`	number of the features to randomly selected by the decision tree of each base random forest (def.5)
`cutoff`	a numeric vector of length 2. Cutoff for respectively the majority and minority class. This parameter is meaningful when used with the thresholded version of hyperSMURF (parameter `thresh`=TRUE)
`seed`	initialization seed for the random generator. If set to 0(def.) no initialization is performed
`file`	name of the file where the cross-validated hyperSMURF models will be saved. If file=="" (def.) no model is saved.

A different random forest is trained on each partition of the training set. If npos and nneg are the the number of respectively the positive and negative examples, for each partition of the training data fp*npos new synthetic positives constructed by the SMOTE algorithm are added to the training set. The number of negatives is set to ratio*(fp*npos + npos). If no enough negatives are available in the partition, then all the negatives in the partition are used to train the base RF associated to the partition.

A list of trained RF models. Each element of the list is a randomForest objects of the homonymous package.

M. Schubach, M. Re, P.N. Robinson and G. Valentini Imbalance-Aware Machine Learning for Predicting Rare and Common Disease-Associated Non-Coding Variants, Scientific Reports, Nature Publishing, 7:2959, 2017.

hyperSMURF.test

1
2
3

train <- imbalanced.data.generator(n.pos=20, n.neg=1000, 
          n.features=10, n.inf.features=2, sd=1, seed=1);
HSmodel <- hyperSMURF.train(train$data, train$label, n.part = 5, fp = 1, ratio = 2);

Training of ensemble  1 done.
Training of ensemble  2 done.
Training of ensemble  3 done.
Training of ensemble  4 done.
Training of ensemble  5 done.

hyperSMURF documentation built on May 2, 2019, 9:20 a.m.

hyperSMURF index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

hyperSMURF
Hyper-Ensemble Smote Undersampled Random Forests

hyperSMURF.train: hyperSMURF training
In hyperSMURF: Hyper-Ensemble Smote Undersampled Random Forests

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Example output

Related to hyperSMURF.train in hyperSMURF...

R Package Documentation

Browse R Packages

We want your feedback!

hyperSMURF Hyper-Ensemble Smote Undersampled Random Forests

hyperSMURF.train: hyperSMURF training In hyperSMURF: Hyper-Ensemble Smote Undersampled Random Forests

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Example output

Related to hyperSMURF.train in hyperSMURF...

R Package Documentation

Browse R Packages

We want your feedback!

hyperSMURF
Hyper-Ensemble Smote Undersampled Random Forests

hyperSMURF.train: hyperSMURF training
In hyperSMURF: Hyper-Ensemble Smote Undersampled Random Forests