hybridEnsemble: Binary classification with Hybrid Ensemble
In hybridEnsemble: Build, Deploy and Evaluate Hybrid Ensembles

hybridEnsemble

R Documentation

Binary classification with Hybrid Ensemble

Description

hybridEnsemble can build an ensemble consisting of nine different sub-ensembles: Bagged Logistic Regressions, Random Forest, Stochastic AdaBoost, Kernel Factory, Bagged Neural Networks, Bagged Support Vector Machines, Rotation Forest, Bagged K-Nearest Neighbors, and Naive Bayes.

Usage

hybridEnsemble(
  x = NULL,
  y = NULL,
  algorithms = c("LR", "RF", "AB", "KF", "NN", "SV", "RoF", "KN", "NB"),
  combine = NULL,
  eval.measure = "auc",
  verbose = FALSE,
  oversample = TRUE,
  calibrate = FALSE,
  filter = 0.01,
  LR.size = 10,
  RF.ntree = 500,
  AB.iter = 500,
  AB.maxdepth = 3,
  KF.cp = 1,
  KF.rp = round(log(nrow(x), 10)),
  KF.ntree = 500,
  NN.rang = 0.1,
  NN.maxit = 10000,
  NN.size = c(5, 10, 20),
  NN.decay = c(0, 0.001, 0.01, 0.1),
  NN.skip = c(TRUE, FALSE),
  NN.ens.size = 10,
  SV.gamma = 2^(-15:3),
  SV.cost = 2^(-5:13),
  SV.degree = c(2, 3),
  SV.kernel = c("radial", "sigmoid", "linear", "polynomial"),
  SV.size = 10,
  RoF.L = 10,
  KN.K = c(1:150),
  KN.size = 10,
  NB.size = 10,
  rbga.popSize = length(algorithms) * 14,
  rbga.iters = 500,
  rbga.mutationChance = 1/rbga.popSize,
  rbga.elitism = max(1, round(rbga.popSize * 0.05)),
  DEopt.nP = 20,
  DEopt.nG = 300,
  DEopt.F = 0.9314,
  DEopt.CR = 0.6938,
  GenSA.maxit = 300,
  GenSA.temperature = 0.5,
  GenSA.visiting.param = 2.7,
  GenSA.acceptance.param = -5,
  GenSA.max.call = 1e+07,
  malschains.popsize = 60,
  malschains.ls = "cmaes",
  malschains.istep = 300,
  malschains.effort = 0.5,
  malschains.alpha = 0.5,
  malschains.threshold = 1e-08,
  malschains.maxEvals = 300,
  psoptim.maxit = 300,
  psoptim.maxf = Inf,
  psoptim.abstol = -Inf,
  psoptim.reltol = 0,
  psoptim.s = 40,
  psoptim.k = 3,
  psoptim.p = 1 - (1 - 1/psoptim.s)^psoptim.k,
  psoptim.w = 1/(2 * log(2)),
  psoptim.c.p = 0.5 + log(2),
  psoptim.c.g = 0.5 + log(2),
  soma.pathLength = 3,
  soma.stepLength = 0.11,
  soma.perturbationChance = 0.1,
  soma.minAbsoluteSep = 0,
  soma.minRelativeSep = 0.001,
  soma.nMigrations = 300,
  soma.populationSize = 10,
  tabu.iters = 300,
  tabu.listSize = c(5:12)
)

Arguments

`x`	A data frame of predictors. Categorical variables need to be transformed to binary (dummy) factors.
`y`	A factor of observed class labels (responses) with the only allowed values {0,1}.,
`algorithms`	Which algorihtms to use {"LR","RF","AB","KF","NN","SV","RoF","KN","NB"}. LR= Bagged Logistic Regression, RF=Random Forest, AB= AdaBoost, KF= Kernel Factory, NN= Bagged Neural Network, SV= Bagged Support Vector Machines, RoF= Rotation Forest, KN= Bagged K- Nearest Neighbors, NB= Bagged Naive Bayes.
`combine`	Additional methods for combining the sub-ensembles. The simple mean, authority-based weighting and the single best are automatically provided since they are very efficient. Possible additional methods: Genetic Algorithm: "rbga", Differential Evolutionary Algorithm: "DEopt", Generalized Simulated Annealing: "GenSA", Memetic Algorithm with Local Search Chains: "malschains", Particle Swarm Optimization: "psoptim", Self-Organising Migrating Algorithm: "soma", Tabu Search Algorithm: "tabu", Non-negative binomial likelihood: "NNloglik", Goldfarb-Idnani Non-negative least squares: "GINNLS", Lawson-Hanson Non-negative least squares: "LHNNLS".
`eval.measure`	Evaluation measure for the following combination methods: authority-based method, single best, "rbga", "DEopt", "GenSA", "malschains", "psoptim", "soma", "tabu". Default is the area under the receiver operator characteristic curve 'auc'. The area under the sensitivity curve ('sens') and the area under the specificity curve ('spec') are also supported.
`verbose`	TRUE or FALSE. Should information be printed to the screen while estimating the Hybrid Ensemble.
`oversample`	TRUE or FALSE. Should oversampling be used? Setting oversample to TRUE helps avoid computational problems related to the subsetting process.
`calibrate`	TRUE or FALSE. If FALSE percentile ranks of the prediction vectors will be used.
`filter`	either NULL (deactivate) or a percentage denoting the minimum class size of dummy predictors. This parameter is used to remove near constants. For example if nrow(xTRAIN)=100, and filter=0.01 then all dummy predictors with any class size equal to 1 will be removed. Set this higher (e.g., 0.05 or 0.10) in case of errors.
`LR.size`	Logistic Regression parameter. Ensemble size of the bagged logistic regression sub-ensemble.
`RF.ntree`	Random Forest parameter. Number of trees to grow.
`AB.iter`	Stochastic AdaBoost parameter. Number of boosting iterations to perform.
`AB.maxdepth`	Stochastic AdaBoost parameter. The maximum depth of any node of the final tree, with the root node counted as depth 0.
`KF.cp`	Kernel Factory parameter. The number of column partitions.
`KF.rp`	Kernel Factory parameter. The number of row partitions.
`KF.ntree`	Kernel Factory parameter. Number of trees to grow.
`NN.rang`	Neural Network parameter. Initial random weights on [-rang, rang].
`NN.maxit`	Neural Network parameter. Maximum number of iterations.
`NN.size`	Neural Network parameter. Number of units in the single hidden layer. Can be mutiple values that need to be optimized.
`NN.decay`	Neural Network parameter. Weight decay. Can be mutiple values that need to be optimized.
`NN.skip`	Neural Network parameter. Switch to add skip-layer connections from input to output. Can be boolean vector (TRUE and FALSE) for optimization.
`NN.ens.size`	Neural Network parameter. Ensemble size of the neural network sub-ensemble.
`SV.gamma`	Support Vector Machines parameter. Width of the Guassian for radial basis and sigmoid kernel. Can be mutiple values that need to be optimized.
`SV.cost`	Support Vector Machines parameter. Penalty (soft margin constant). Can be mutiple values that need to be optimized.
`SV.degree`	Support Vector Machines parameter. Degree of the polynomial kernel. Can be mutiple values that need to be optimized.
`SV.kernel`	Support Vector Machines parameter. Kernels to try. Can be one or more of: 'radial','sigmoid','linear','polynomial'. Can be mutiple values that need to be optimized.
`SV.size`	Support Vector Machines parameter. Ensemble size of the SVM sub-ensemble.
`RoF.L`	Rotation Forest parameter. Number of trees to grow.
`KN.K`	K-Nearest Neigbhors parameter. Number of nearest neighbors to try. For example c(10,20,30). The optimal K will be selected. If larger than nrow(xTRAIN) the maximum K will be reset to 50% of nrow(xTRAIN). Can be mutiple values that need to be optimized.
`KN.size`	K-Nearest Neigbhors parameter. Ensemble size of the K-nearest neighbor sub-ensemble.
`NB.size`	Naive Bayes parameter. Ensemble size of the bagged naive bayes sub-ensemble.
`rbga.popSize`	Genetic Algorithm parameter. Population size. Default is 14 times the number of variables.
`rbga.iters`	Genetic Algorithm parameter. Number of iterations.
`rbga.mutationChance`	Genetic Algorithm parameter. The chance that a gene in the chromosome mutates.
`rbga.elitism`	Genetic Algorithm parameter. Number of chromosomes that are kept into the next generation.
`DEopt.nP`	Differential Evolutionary Algorithm parameter. Population size.
`DEopt.nG`	Differential Evolutionary Algorithm parameter. Number of generations.
`DEopt.F`	Differential Evolutionary Algorithm parameter. Step size.
`DEopt.CR`	Differential Evolutionary Algorithm parameter. Probability of crossover.
`GenSA.maxit`	Generalized Simulated Annealing. Maximum number of iterations.
`GenSA.temperature`	Generalized Simulated Annealing. Initial value for temperature.
`GenSA.visiting.param`	Generalized Simulated Annealing. Parameter for visiting distribution.
`GenSA.acceptance.param`	Generalized Simulated Annealing. Parameter for acceptance distribution.
`GenSA.max.call`	Generalized Simulated Annealing. Maximum number of calls of the objective function.
`malschains.popsize`	Memetic Algorithm with Local Search Chains parameter. Population size.
`malschains.ls`	Memetic Algorithm with Local Search Chains parameter. Local search method.
`malschains.istep`	Memetic Algorithm with Local Search Chains parameter. Number of iterations of the local search.
`malschains.effort`	Memetic Algorithm with Local Search Chains parameter. Value between 0 and 1. The ratio between the number of evaluations for the local search and for the evolutionary algorithm. A higher effort means more evaluations for the evolutionary algorithm.
`malschains.alpha`	Memetic Algorithm with Local Search Chains parameter. Crossover BLX-alpha. Lower values (<0.3) reduce diversity and a higher value increases diversity.
`malschains.threshold`	Memetic Algorithm with Local Search Chains parameter. Threshold that defines how much improvement in the local search is considered to be no improvement.
`malschains.maxEvals`	Memetic Algorithm with Local Search Chains parameter. Maximum number of evaluations.
`psoptim.maxit`	Particle Swarm Optimization parameter. Maximum number of iterations.
`psoptim.maxf`	Particle Swarm Optimization parameter. Maximum number of function evaluations.
`psoptim.abstol`	Particle Swarm Optimization parameter. Absolute convergence tolerance.
`psoptim.reltol`	Particle Swarm Optimization parameter. Tolerance for restarting.
`psoptim.s`	Particle Swarm Optimization parameter. Swarm size.
`psoptim.k`	Particle Swarm Optimization parameter. Exponent for calculating number of informants.
`psoptim.p`	Particle Swarm Optimization parameter. Average percentage of informants for each particle.
`psoptim.w`	Particle Swarm Optimization parameter. Exploitation constant.
`psoptim.c.p`	Particle Swarm Optimization parameter. Local exploration constant.
`psoptim.c.g`	Particle Swarm Optimization parameter. Global exploration constant.
`soma.pathLength`	Self-Organising Migrating Algorithm parameter. Distance (towards the leader) that individuals may migrate.
`soma.stepLength`	Self-Organising Migrating Algorithm parameter. Granularity at which potential steps are evaluated.
`soma.perturbationChance`	Self-Organising Migrating Algorithm parameter. Probability that individual parameters are changed on any given step.
`soma.minAbsoluteSep`	Self-Organising Migrating Algorithm parameter. Smallest absolute difference between maximum and minimum cost function values. Below this minimum the algorithm will terminate.
`soma.minRelativeSep`	Self-Organising Migrating Algorithm parameter. Smallest relative difference between maximum and minimum cost function values. Below this minimum the algorithm will terminate.
`soma.nMigrations`	Self-Organising Migrating Algorithm parameter. Maximum number of migrations to complete.
`soma.populationSize`	Self-Organising Migrating Algorithm parameter. Population size.
`tabu.iters`	Number of iterations in the preliminary search of the algorithm.
`tabu.listSize`	Tabu list size.

Value

A list of class hybridEnsemble containing the following elements:

`LR`	Bagged Logistic Regression model
`LR.lambda`	Shrinkage parameter
`RF`	Random Forest model
`AB`	Stochastic AdaBoost model
`KF`	Kernel Factory model
`NN`	Bagged Neural Network model
`SV`	Bagged Support Vector Machines model
`RoF`	Rotation Forest
`NB`	Bagged Naive Bayes
`SB`	A label denoting which sub-ensemble was the single best
`KN.K`	Optimal number of nearest neighbors
`x_KN`	The full data set for finding the nearest neighbors in the deployment phase
`y_KN`	The full response vector to compute the response of the nearest neigbhors
`KN.size`	Size of the nearest neigbhor sub-ensemble
`weightsAUTHORITY`	The weights for the authority-based weighting method
`combine`	Combination methods used
`constants`	A vector denoting which predictors are constants
`minima`	Minimum values of the predictors required for preprocessing the data for the Neural Network
`maxima`	Maximum values of the predictors required for preprocessing the data for the Neural Network
`minimaKN`	Minimum values of the predictors required for preprocessing the data for the Nearest Neighbors and Naive Bayes
`maximaKN`	Maximum values of the predictors required for preprocessing the data for the Nearest Neighbors and Naive Bayes
`calibratorLR`	The calibrator for the Bagged Logistic Regression model
`calibratorRF`	The calibrator for the Random Forest model
`calibratorAB`	The calibrator for the Stochastic AdaBoost model
`calibratorKF`	The calibrator for the Kernel Factory model
`calibratorNN`	The calibrator for the Neural Network model
`calibratorSV`	The calibrator for the Bagged Support Vector Machines model
`calibratorRoF`	The calibrator for the Rotation Forest model
`calibratorKN`	The calibrator for the Bagged Nearest Neigbhors
`calibratorNB`	The calibrator for the Bagged Naive Bayes model
`xVALIDATE`	Predictors of the validation sample
`predictions`	The seperate predictions by the sub-ensembles
`yVALIDATE`	Response variable of the validation sample
`eval.measure`	The evaluation measure that was used

Author(s)

Michel Ballings, Dauwe Vercamer, Matthias Bogaert, and Dirk Van den Poel, Maintainer: Michel.Ballings@GMail.com

References

Ballings, M., Vercamer, D., Bogaert, M., Van den Poel, D.

Examples


data(Credit)

## Not run: 
hE <-hybridEnsemble(x=Credit[1:100,names(Credit) != 'Response'],
                    y=Credit$Response[1:100],
                    RF.ntree=50,
                    AB.iter=50,
                    NN.size=5,
                    NN.decay=0,
                    SV.gamma = 2^-15,
                    SV.cost = 2^-5,
                    SV.degree=2,
                    SV.kernel='radial')

## End(Not run)

hybridEnsemble documentation built on April 1, 2023, 12:13 a.m.