Automation of preprocessing often requires computationally costly preprocessing combinations. This package helps to find near-best combinations faster. Metaheuristics supported are taboo search, simulated annealing, reheating and late acceptance. Start conditions include randon and grid starts. End conditions include all iteration rounds completed, objective threshold reached and convergence. Metaheuristics, start and end conditions can be hybridized and hyperparameters optimized. Parallel computations are supported. The package is intended to be used with package 'preprocomb' and takes its 'GridClass' object as input.
Let's start by adding contaminations to Iris-data to simulate the need for preprocessing:
set.seed(1) testdata <- iris testdata[sample(1:150,40),3] <- NA # add missing values to the third variable testdata[,4] <- rnorm(150, testdata[,4], 2) # add noise to the fourth variable testdata$Irrelevant <- runif(150, 0, 1) # add an irrelevant feature
A grid of 540 preprocessing combinations and corresponding preprocessed data sets to be searched from is created with package preprocomb and its setgrid() function.
library(preprocomb) examplegrid <- preprocomb::setgrid(phases=c("imputation", "smoothing", "scaling", "outliers", "selection"), data=testdata)
The metaheuristic search is controlled by the metaheur() function. An example below does 54 iterations (10% of the search space) with 20 times validated holdout classification accuracy. Classifier is support vector machine svmRadial from kernlab package (assumed to be loaded.)
examplesearch <- metaheur(examplegrid, model="svmRadial", iterations = 54, nholdout = 20)
Execution wall-clock time in minutes can be extracted with:
The resulting near-optimal solution can be extracted by getbestheur() function.
The logs can be extracted:
Metaheuristic search hyperparameters can be optimized with either grid or random search.
examplehyperparam <- metaheurhyper(examplegrid, cores=2, trials=10, iterations=54, nholdout=10)
The resulting hyperparameter search result can be extracted by plotting the results.
The plot title shows the combination of hyperparameters that has the highest mean of the best of runs in trials. The panels show best, worst and median scenarios for the combination. In the example best hyperparameters were surprisingly explorative.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.