generateDefaultClassifierParams: Generates default parameters for classification algorithms

View source: R/SOptim_ClassificationFunctions.R

generateDefaultClassifierParamsR Documentation

Generates default parameters for classification algorithms

Description

This is an auxiliary function used for generating a list of default parameters used for the available classification algorithms: Random Forests (RF), K-nearest neighbour (KNN), Flexible Discriminant Analysis (FDA), Support Vector Machines (SVM) and Generalized Boosted Model (GBM).

Usage

generateDefaultClassifierParams(x)

Arguments

x

The dataset used for classification (this is required to calculate some classifier hyperparameters based on the number of columns/variables in the data).

Value

A nested list object of class classificationParamList with parameters for the available algorithms, namely:

  • RF - Random Forest parameters:

    • mtry is equal to floor(sqrt(ncol(x)-2)) and defines the number of variables randomly sampled as candidates at each split

    • ntree equals 250 (by default) and is the number of trees to grow

  • KNN - K-nearest neighbour parameters:

    • k is equal to 5 and is the number of neighbours considered

  • FDA - Flexible Discriminant Analysis with MDA-MARS parameters:

    • degree equals 1 defining an optional integer specifying maximum interaction degree

    • penalty is equal to 2 and sets an optional value specifying the cost per degree of freedom charge

    • prune is set to TRUE and defines an optional logical value specifying whether the model should be pruned in a backward stepwise fashion

  • SVM - Support Vector Machine (with radial-basis kernel) parameters:

    • gamma equals 1/(ncol(x)-2) and sets the parameter needed for all kernels except linear

    • cost equal to 1 defines the cost of constraints violation - it is the 'C'-constant of the regularization term in the Lagrange formulation

    • probability is equal to TRUE and defines the output type

  • GBM - Generalized Boosted Modeling parameters:

    • n.trees set to 250 defining the total number of trees to fit

    • interaction.depth equal to 1 which defines he maximum depth of variable interactions. 1 implies an additive model, 2 implies a model with up to 2-way interactions, etc

    • shrinkage set to 0.01 is a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction.

    • bag.fraction set to 0.5 and equals the fraction of the training set observations randomly selected to propose the next tree in the expansion)

    • distribution set to bernoulli (if single-class) or multinomial (if multi-class) this parameter defines the applicable distribution used for classification

See Also

replaceDefaultClassificationParams

Examples


DF <- data.frame(SID=1:5, train=sample(0:1,5,replace=TRUE), Var_1=rnorm(5), Var_2=rnorm(5))
generateDefaultClassifierParams(DF)



joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.