autoRLearn: Run smartML function for automatic Supervised Machine...
In DataSystemsGroupUT/SmartML: Machine Learning Automation

Description Usage Arguments Value Examples

View source: R/autoRLearn.R

Run the smartML main function for automatic classifier algorithm selection, and hyper-parameter tuning.

autoRLearn(
  maxTime,
  directory,
  testDirectory,
  classCol = "class",
  metric = "acc",
  vRatio = 0.3,
  preProcessF = c("standardize", "zv"),
  featuresToPreProcess = c(),
  nComp = NA,
  nModels = 5,
  option = 2,
  featureTypes = c(),
  interp = FALSE,
  missingOpr = FALSE,
  balance = FALSE
)

`maxTime`	Float numeric of the maximum time budget for reading dataset, preprocessing, calculating meta-features, Algorithm Selection & hyper-parameter tuning process only in minutes(Excluding Model Interpretability) - This is applicable in case of Option = 2 only.
`directory`	String Character of the training dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).
`testDirectory`	String Character of the testing dataset directory (SmartML accepts file formats arff/(csv with columns headers) ).
`classCol`	String Character of the name of the class label column in the dataset (default = 'class').
`metric`	Metric of string character to be used in evaluation: "acc" - Accuracy, "avg-fscore" - Average of F-Score of each label, "avg-recall" - Average of Recall of each label, "avg-precision" - Average of Precision of each label, "fscore" - Micro-Average of F-Score of each label, "recall" - Micro-Average of Recall of each label, "precision" - Micro-Average of Precision of each label.
`vRatio`	Float numeric of the validation set ratio that should be splitted out of the training set for the evaluation process (default = 0.1 –> 10%).
`preProcessF`	vector of string Character containing the name of the preprocessing algorithms (default = c('standardize', 'zv') –> no preprocessing): "boxcox" - apply a Box–Cox transform and values must be non-zero and positive in all features, "yeo-Johnson" - apply a Yeo-Johnson transform, like a BoxCox, but values can be negative, "zv" - remove attributes with a zero variance (all the same value), "center" - subtract mean from values, "scale" - divide values by standard deviation, "standardize" - perform both centering and scaling, "normalize" - normalize values, "pca" - transform data to the principal components, "ica" - transform data to the independent components.
`featuresToPreProcess`	Vector of number of features to perform the feature preprocessing on - In case of empty vector, this means to include all features in the dataset file (default = c()) - This vector should be a subset of `selectedFeats`.
`nComp`	Integer numeric of Number of components needed if either "pca" or "ica" feature preprocessors are needed.
`nModels`	Integer numeric representing the number of classifier algorithms that you want to select based on Meta-Learning and start to tune using Bayesian Optimization (default = 5).
`option`	Integer numeric representing either Classifier Algorithm Selection is needed only = 1 or Algorithm selection with its parameter tuning is required = 2 which is the default value.
`featureTypes`	Vector of either 'numerical' or 'categorical' representing the types of features in the dataset (default = c() –> any factor or character features will be considered as categorical otherwise numerical).
`interp`	Boolean representing if model interpretability (Feature Importance and Interaction) is needed or not (default = FALSE) This option will take more time budget if set to 1.
`missingOpr`	Boolean variable represents either use median/mode imputation for instances with missing values (FALSE) or apply imputation using "MICE" library which helps you imputing missing values with plausible data values that are drawn from a distribution specifically designed for each missing datapoint (TRUE).
`balance`	Boolean variable represents if SMOTE class balancing is required or not (default FALSE).

List of Results

"option=1" - Choosen Classifier Algorithms Names clfs with their parameters configurations params, Training DataFrame TRData, Test DataFrame TEData in case of option=2,
"option=2" - Best classifier algorithm name found clfs with its parameters configuration params, , Training DataFrame TRData, Test DataFrame TEData, model variable model, predicted values on test set pred, performance on TestingSet perf, and Feature Importance interpret$featImp / Interaction interpret$Interact plots in case of interpretability interp = TRUE and chosen model is not knn.

## Not run: 
autoRLearn(1, 'sampleDatasets/car/train.arff', \
'sampleDatasets/car/test.arff', option = 2, preProcessF = 'normalize')

result <- autoRLearn(10, 'sampleDatasets/shuttle/train.arff', 'sampleDatasets/shuttle/test.arff')

## End(Not run)