autoRLearn: Run smartML function for automatic Supervised Machine...

Description Usage Arguments Value Examples

View source: R/autoRLearn.R

Description

Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
autoRLearn(
  maxTime,
  directory,
  testDirectory,
  classCol = "class",
  metric = "acc",
  vRatio = 0.3,
  preProcessF = c("standardize", "zv"),
  featuresToPreProcess = c(),
  nComp = NA,
  nModels = 5,
  option = 2,
  featureTypes = c(),
  interp = FALSE,
  missingOpr = FALSE,
  balance = FALSE
)

Arguments

maxTime

Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only.

directory

String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).

testDirectory

String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).

classCol

String Character of the name of the class label column in the dataset (default = 'class').

metric

Metric of string character to be used in evaluation:

  • "acc" - Accuracy,

  • "avg-fscore" - Average of F-Score of each label,

  • "avg-recall" - Average of Recall of each label,

  • "avg-precision" - Average of Precision of each label,

  • "fscore" - Micro-Average of F-Score of each label,

  • "recall" - Micro-Average of Recall of each label,

  • "precision" - Micro-Average of Precision of each label.

vRatio

Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 –> 10%).

preProcessF

vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') –> no preprocessing):

  • "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features,

  • "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative,

  • "zv" - remove attributes with a zero variance (all the same value),

  • "center" - subtract mean from values,

  • "scale" - divide values by standard deviation,

  • "standardize" - perform both centering and scaling,

  • "normalize" - normalize values,

  • "pca" - transform data to the principal components,

  • "ica" - transform data to the independent components.

featuresToPreProcess

Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of selectedFeats.

nComp

Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed.

nModels

Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5).

option

Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value.

featureTypes

Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() –> any factor or character features will be considered as categorical otherwise numerical).

interp

Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1.

missingOpr

Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE).

balance

Boolean variable represents if SMOTE class balancing is required or not (default FALSE).

Value

List of Results

Examples

1
2
3
4
5
6
7
## Not run: 
autoRLearn(1, 'sampleDatasets/car/train.arff', \
'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize')

result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')

## End(Not run)

DataSystemsGroupUT/SmartML documentation built on Nov. 24, 2020, 1:31 p.m.