In Pizzaknoedel/visualize-hyperparameter: Visualizing Hyperparamer Performance Dependencies

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, messages = FALSE)

Introduction

Idea

The idea of the analysis of the smashy_lcbench and smashy_super datasets is to understand the dependencies between the hyperparameters and the target variable yval, using the implemented plots from the VisHyp package and, most importantly, without the help of any automatic optimization methods. We want to understand which parameters are important, i.e. if they have a large impact on the results and if they can be neglected. Furthermore, we want to find configuration spaces which produce great performances and evaluate them with quality criteria. Finally, we want to compare the results of the two datasets.

For each dataset, we want to examine the entire dataset and the best 20% of the configurations with the yval values to get a more detailed insight into the configurations of the best performances. We will partition our data with the bounded range per parameter to obtain a subset of configurations with good yval values. We will also look at the constrained parameter ranges using PCPs.

We will use importance plots, partial dependence plots (PDP), heatmaps, and parallel coordinate plots (PCP) to analyze the data. Importance plots provide the most important parameters. For a quick overview, we will use heatmaps. For a deeper insight into the structure and the dependencies between two parameters and the performance measure we will use the PDP. Only when the dataset has been reduced in size, we can also use PCPs to get an impression about good parameter configurations.

Structure and Outline

The analysis is structured as follows. First, the considered dataset is prepared. Then, the analyses are performed and the results are used to suggest good configuration ranges for each parameter. The analysis of each parameter, can be selected in the table of contents (TOC) on the left. Prior to this chapter, an overview of the dataset is provided. Finally, the results of the two datasets are compared.

Dataset: smashy_lcbench

Data Preparation

We need to load packages and subdivide the data to compare the whole dataset and the dataset with the 20% of the best configurations.

Load Data

library(VisHyp)
library(mlr3)
library(plotly)

lcbenchSmashy <- readRDS("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Data/smashy_lcbench.rds")
lcbenchSmashy <- as.data.frame(lcbenchSmashy)

n_lcbench <- length(lcbenchSmashy)
for (i in 1:n_lcbench) {
  if(is.logical(lcbenchSmashy[,i]))
    lcbenchSmashy[,i] <- as.factor(lcbenchSmashy[,i])
  if(is.character(lcbenchSmashy[,i]))
    lcbenchSmashy[,i] <- as.factor(lcbenchSmashy[,i])
}

All VisHyp plots require a mlr3 task object as an input. For the smashy_lcbench dataset, the target is yval. Values close to 0 indicate good performances.

Create Task

lcbenchTask <- TaskRegr$new(id = "task_lcbench", backend = lcbenchSmashy, target = "yval")

lcbenchBest <- lcbenchSmashy[lcbenchSmashy$yval >= quantile(lcbenchSmashy$yval, 0.8),]
bestTask <- TaskRegr$new(id = "bestTask", backend = lcbenchBest, target = "yval")

Results

The target parameter yval can reach values between -0.9647 and -0.4690. Our goal was to obtain good results, i.e., to find configurations that produce values close to -0.4690.

The most important parameter is sample. Its factor "bohb" should always be chosen over its factor "random", because 2130 of the best 2143 configurations were created with "bohb", also its average effect on yval is much larger.

The next very important parameter is survival_rate. It could be examined that a low value is better on average, but high values can also lead to the best performances. A value between 0.15 and 0.5 should be chosen for a high average performance without any further limitation. If a surrogate_Learner is selected, the constraints of the survival_rate parameter should be chosen according to this parameter.

Even though the surrogate_learner parameter is not that important, it influences most of the other parameters. That means that other parameter values should be set depending on the selected surrogate_learner, if they have different effects on the performance measure. An indication that the surrogate_learner parameter has a large impact on the other parameters is given by the importance plot for the partial dataset splitted by surrogate_learner. This assigned different importance to the individual parameters, depending on the subset selected. This is especially noticeable for the factor "bohb" of the sample parameter. Parameters that are selected according to the chosen factor of the surrogate_learner are listed below. However, there are also findings for which surrogate_learner gives the best results: In the full dataset, surrogate_learner's factors "knn1" and "knn7" show the best performances while the factor "ranger" shows the worst. For the top cases, we see that a disproportionate number of "bohblrn" and "rangers" are filtered out. Surprisingly, "bohblrn" turns out to be the factor with the greatest importance.

knn1 survival_fraction: The value should be set above 0.5 if we are interested in the best cases. For the whole dataset, the best cases were below 0.5 on average. random_interleave_fraction: The value should be low and lay between 0.05 and 0.5 according to the complete dataset. budget_log_step: The value should be chosen between -0.5 and 0.5. filter_factor_first: The value should be below 4. filter_select_per_tournament: The value should be over 0.9.

knn7 filter_factor_first: The value should be below 4. survival_fraction: The value should lay between 0.1 and 1 according to both, the entire dataset and the subset. budget_log_step: The parameter produces good performances for values between -0.5 and 1 but has not a big impact in general. random_interleave_fraction: The value should lay between 0.25 and 0.75 according to the entire dataset. It has no importance in the subset. random_interleave_random: The parameter should be set to "FALSE". filter_select_per_tournament: The value should be over 0.5.

bohblrn random_interleave_fraction: The value is prefered to be lower. The value should lay between 0.05 and 0.65. survival_fraction: For this parameter a lower value is better according to the entire dataset. It has no importance in the subset. budget_log_step: It is unclear for the parameter due to the fluctuation but should be at least over -1.5. filter_algorithm: The factor of the parameter should be "progressive". filter_factor_last: The value should be over 5. filter_factor_first: The parameter should not be restricted.

ranger random_interleave_fraction: The value should be over 0.25. survival_fraction: The value should be below 0.75. budget_log_step: The value should be over -1.5.

Another important parameter for the general case is the random_interleave_fraction parameter. In general, it can be said that low values below 0.3 should be preferred in combination with the factor "random" of the sample parameter, and values between 0.1 and 0.75 should be preferred for the factor "bohb" of the sample parameter. But this is caused by its dependency on surrogate_learner, and therefore has many observations for the levels "knn1" and "knn7". For these levels, a low value must be chosen to get a good result. While for the factor "bohb" of the parameter sample values in the middle achieve the best yval values, the factor "ranger" needs high values. For the top cases, the parameter loses importance. A possible reason therefor is that the counter case with "random" factor is almost completely filtered out. The level factor did not change the behavior for the top cases also the middle range is not that important for "bobhlrn" anymore.

The second most important parameter for the filtered dataset containing only "bohb" factors is budget_log_step. For the entire dataset this parameter should lay between -0.5 and 1, but when choosing a surrogate_learner the budget_log_step parameter should be set accordingly.

In general, the parameter filter_with_max_budget is not important. Nevertheless, it should always be set to "TRUE" and is most important for "bohb" factors. Anyway, the effect is important for the surrgoate_learner's "bohblrn" regarding the top cases.

The parameter filter_factor_first is the most important for the top 20% of the best configurations. It also has a higher importance in the filtered dataset if the "random" factor is chosen instead of the "bohb" factor. In general, filter_factor_first should be low (below 4) for the factor "bohb" of the parameter sample and high (near to 6) for the factor "random" of the parameter sample. The hyperparameter filter_factor_first should not be restricted if the surrogate_learner is "bohblrn".

The hyperparameter filter_factor_last has a low effect on the performance and should not be used to subdivide the dataset in general.

The parameter filter_select_per_tournament should not be too high in general but does not really matter for good results.

The parameters filter_algorithm and random_interleave_random have hardly any effect and can be left out for deeper investigations. Only for surrogate learner the factor "bohblrn" should be considered.

Data Constraint to Check the Results

To verify the proposed parameter configurations, the dataset is constrained according to the results and the obtain performances are compared with the ranks of the performances of the entire dataset.

lcbenchEvaluation <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$surrogate_learner == "bohblrn",] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$random_interleave_fraction > 0.05 & lcbenchEvaluation$random_interleave_fraction < 0.65,] 
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$budget_log_step > -1.5,]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_with_max_budget == "TRUE",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_algorithm == "progressive",]
lcbenchEvaluation <- lcbenchEvaluation[lcbenchEvaluation$filter_factor_last > 5,]

lcbenchYval <- sort(lcbenchEvaluation$yval, decreasing = TRUE)
lcbenchYvalOriginal <- sort(lcbenchSmashy$yval, decreasing = TRUE)
sort(match(lcbenchYval, lcbenchYvalOriginal), decreasing = FALSE)

We can see that many good results were obtained, but not nearly all of the best configurations were found. This can be explained by the fact that constraints are often imposed to reduce the size of the dataset. For example, for some categorical parameters, always one factor with many good performances is chosen even though we know that other categories could also achieve good values.

Finally, quality criteria are used to verify the results. The explanation as well as the interpretation can be found in the bachelor's thesis.

summary(lcbenchEvaluation$yval)

#proportion
length(lcbenchEvaluation$yval)/length(lcbenchSmashy$yval)

#top congifuration
sum(lcbenchYval >= quantile(lcbenchSmashy$yval, 0.95))/length(lcbenchYval)

#quantile
sum(lcbenchSmashy$yval<=max(lcbenchYval))/length(lcbenchSmashy$yval)

Visual Overview {.tabset}

Our results can be visually checked with the implemented PCP.

Limitation to Good Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Best_PCP.png")

Limitation to Bad Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/lcbench_Bad_PCP.png")

{.unlisted .unnumbered}

Overview {.tabset}

For visual analyses it is important to know the configuration spaces and the parameters' classes.

Head

head(lcbenchSmashy)

Structure

str(lcbenchSmashy)

We want to look at the importance for the entire dataset (general case) and for the best configurations (top 20%).

{.tabset .unlisted .unnumbered}

Importance General

plotImportance(lcbenchTask)

Importance Best

plotImportance(bestTask)

{.unlisted .unnumbered}

For the general case, sample is the most important hyperparameter. The random_interleave_random parameter is of little importance. For the best configurations, filter_factor_first and filter_factor_last are the most important parameters and the sample parameter is no longer of importance. The ranking of the parameters has changed a lot, but the value of the importance measure has hardly changed for the parameters except for the sample parameter.

plotParallelCoordinate(lcbenchTask)

By looking the PCP it can be seen that there are too many observations to identify any structures and therefore should only be used for fewer observations.

{.tabset .unlisted .unnumbered}

Summary All

summary(lcbenchSmashy)

Summary Best 20% {.unlisted .unnumbered}

summary(lcbenchBest)

{.unlisted .unnumbered}

surrogate_learner: Many "bohblrn" and "ranger" were kicked out in disproportionate numbers. This could indicate that these learners are performing worse on average. filter_with_max_budget: In proportion more "FALSE" were filtered out. This could mean that "TRUE" values perform better on average. It can see that only 13 rows of the the best 20% configurations have the factor "random". The other (over 2100) instances have used the factor "bohb". This is also the reason for the parameter sample's lack of importance for the subdivided dataframe, since there are barely configurations with the factor "random" left.

The hyperparameters will be examined in following sections more precise.

Examination of the Parameters

sample { .tabset}

As could be noticed, sample is the most important parameter in the entire dataset. The parameter should have the right factor to perform well. So let's look at the effect of the parameter by using a PDP. It can also be checked if the effect applies to all parameters. A heatmap can be used to get a quick overview of the interactions. Values close to 1 have hardly any effect on the results.

PDP

plotPartialDependence(lcbenchTask, features = c("sample"), rug = FALSE, plotICE = FALSE)

Heatmap

subplot(
plotHeatmap(lcbenchTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(lcbenchTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)

{.unlisted .unnumbered}

PDP: It can be seen that the target values for the factor "bohb" always lead to better results on average than for the factor "random".

Heatmaps: It should be noted that the parameters survival_fraciton and random_interleave_fraction may give better results if a lower value is chosen. Also, the surrogate_learner's "knn1" and "knn7" seem to give better results than "bohblrn" or "ranger". On average, the factor "bohb" of the parameter sample is better.

By looking at the best results and combinations, only the best configurations are of interest and it can be seen that mostly "bohb" occurs. Therefore, the dataset is split into "bohb" and "random".

random <- lcbenchSmashy[lcbenchSmashy$sample == "random",]
bohb <- lcbenchSmashy[lcbenchSmashy$sample == "bohb",]

randomTask <- TaskRegr$new(id = "task_random", backend = random, target = "yval")
bohbTask <- TaskRegr$new(id = "task_bohb", backend = bohb, target = "yval")

The entire dataset is splitted, because differences are assumed between "random" and "bohb" due to the fact that many "random" were filtered out and the parameter sample lost a lot of importance. For these reasons, the split is primarily focused on the dataset containg only of "bohb" factors. For the best 20% configurations we focus on "bohb" only.

It should be checked if there are importance differences for the parameters in the "random" subset and the "bohb" subset.

{.tabset .unlisted .unnumbered}

Subset bohb

plotImportance(bohbTask)

Subset random

plotImportance(randomTask)

{.unlisted .unnumbered}

The hyperparameter survival_fraction is the most important parameter. Also random_interleave_fraction has high importance for both subsets. The parameters filter_algorithm and random_interleave_random do not seem to be important at all.

"Bohb": The parameter budget_log_step is more important now. Since the parameter was not ranked that high in the first plot, it can be assumed that it is very important for this subset. The importance of the other parameters has not changed that much compared to the full data, but the hyperparameter surrogate_learner and filter_with_max_budget are more important than the factor "random" of the parameter sample.

"Random": It looks like the right parameter configuration is more important for "bohb", because The parameter importance values are higher in general than for "bohb". The parameters filter_factor_last and filter_factor_first have a higher importance for the factor "random" than for "bohb".

Top 20% {.unlisted .unnumbered}

In the beginning it could be seen that most of the good results were gained for "bohb". Therefore, the focus lays on this factor.

bohbBest <- bohb[bohb$yval >= quantile(bohb$yval, 0.8),]
bohbBestTask <- TaskRegr$new(id = "bohbBestTask", backend = bohbBest, target = "yval")

survival_fraction {.tabset}

The survival_fraction parameter is the most important parameter for "bobh" as well as for "random" for the entire dataset. With a PDP, a better insight can be given into the configuration structures of the parameters.

Subset bohb

plotPartialDependence(bohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

Subset random

plotPartialDependence(randomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

In general, lower values achieve better performances than higher values. For the "bohb" subset, the best range seems to be between 0.15 and 0.6. This means that too low values are not very good in this case. For the "random" subset it is almost monotonically decreasing, which means that lower values are always better.

Top 20% {.unlisted .unnumbered}

A possibility to explain the structure is to filter the dataset again. For this the data can be splitted according to the best 20% yval values of the factor "bohb".

plotPartialDependence(bohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE, gridsize = 20)

In this case, higher values seem to be somewhat better. This is surprising, since lower values were more important in the entire dataset. It means that with good configurations of other parameters, the survival_fraction parameter achieves even better results with high values. This could also explain the increase in the range between 0.5 and 0.75. By looking at the rug, it can be seen that most configurations were made below 0.5 and the fewest configurations were made above 0.5. Because of the few configurations with high values, the effect of good performances in this range is less strong. In the range between 0.5 and 0.75 there are more configurations, which therefore have a greater impact on the average curve. However, the difference on the y-axis is only small and therefore it cannot be said that high values are better.

surrogate_learner {.tabset}

Another important parameter for the "bohb" subset is the surrogate_learner.

plotPartialDependence(bohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

In this graphic, "knn1" and "knn7" seem to be the best choices based on the results so far. For a more detailed analysis, the data should be divided into the factors of the surrogate_learner and checked if there are differences in the importance of the remaining parameters.

knn1 <- bohb[bohb$surrogate_learner == "knn1",] 
knn7 <- bohb[bohb$surrogate_learner == "knn7",] 
bohblrn <- bohb[bohb$surrogate_learner == "bohblrn",]
ranger <- bohb[bohb$surrogate_learner == "ranger",]

knn1Task <- TaskRegr$new(id = "knn1Task", backend = knn1, target = "yval")
knn7Task <- TaskRegr$new(id = "knn7Task", backend = knn7, target = "yval")
bohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = bohblrn, target = "yval")
rangerTask <- TaskRegr$new(id = "rangerTask", backend = ranger, target = "yval")

Subset: knn1

plotImportance(knn1Task)

Subset: knn7

plotImportance(knn7Task)

Subset: bohblrn

plotImportance(bohblrnTask)

Subset: ranger

plotImportance(rangerTask)

{.unlisted .unnumbered}

The parameter survival_fraction is very important for the "bohblrn" and the "knn1" subset, which could already be seen in the PDP for survival_fraction. The hyperparameter random_interleave_fraction has a high importance for all surrogate_learners. For the factor "knn7" the parameter budget_log_step seems to be more important than for the other factors of the surrogate_learner. To check why the importance differs and whether the parameters have different good ranges, a closer look should be taken at three very important parameters according to the importance plot. Later each factor will be checked separately for the top 20% of the configurations to find differences.

{.tabset .unlisted .unnumbered}

knn1: random_interleave_fraction

plotPartialDependence(knn1Task, "random_interleave_fraction", plotICE = FALSE)

knn7: random_interleave_fraction

plotPartialDependence(knn7Task, "random_interleave_fraction", plotICE = FALSE)

bohblrn: random_interleave_fraction

plotPartialDependence(bohblrnTask, "random_interleave_fraction", plotICE = FALSE)

ranger: random_interleave_fraction

plotPartialDependence(rangerTask, "random_interleave_fraction", plotICE = FALSE)

{.unlisted .unnumbered}

For "knn1", lower values for the parameter random_interleave_fraction seem to be better. For "knn7" and "bohblrn", the values should be neither too high nor too low, and for "ranger", higher values lead to better yval results. A good range for "bohblrn" seems to be between 0.05 and 0.65. For knn1 a value between 0.05 and 0.5 seems to be good. A good range for "knn7" seems to be between 0.25 and 0.75.

{.tabset .unlisted .unnumbered}

knn1: survival_fraction

plotPartialDependence(knn1Task, "survival_fraction", plotICE = FALSE)

knn7: survival_fraction

plotPartialDependence(knn7Task, "survival_fraction", plotICE = FALSE)

bohblrn: survival_fraction

plotPartialDependence(bohblrnTask, "survival_fraction", plotICE = FALSE)

ranger: survival_fraction

plotPartialDependence(rangerTask, "survival_fraction", plotICE = FALSE)

{.unlisted .unnumbered}

In general, values below 0.5 are preferred for the parameter survival_fraction. For the factor "knn7" of the surrogate_learner values around 0.5 seem to produce the best performanes, for the factor "knn1" a good choice is between 0.1 and 0.6. For all other factors values below 0.5 are better.

{.tabset .unlisted .unnumbered}

knn1: budget_log_step

plotPartialDependence(knn1Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

knn7: budget_log_step

plotPartialDependence(knn7Task, "budget_log_step", gridsize = 40, plotICE = FALSE)

bohblrn: budget_log_step

plotPartialDependence(bohblrnTask, "budget_log_step", plotICE = FALSE)

ranger: budget_log_step

plotPartialDependence(rangerTask, "budget_log_step", plotICE = FALSE)

{.unlisted .unnumbered}

It is very interesting that the line for the parameter budget_log_step shows the most repeated dips for the factors "knn7" and "knn1". The range is hard to identify since it also depends on the gridsize of the plot. It can be said that a value over -0.5 is a good choice for "knn7" and "ranger." For "bohb" values should be over -0.5. For "knn1" and "knn7" values between -0.5 and 1 seems to achieve good results.

Top 20% {.unlisted .unnumbered}

It also should be investigated, which cases are the best. For this purpose the subdivided dataset is checked by searching and analyzing the most important parameters with the help of the importance plot. In addition, abnormalities are examined in the PCP in more detail.

plotPartialDependence(bestTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

The factor "bohblrn" of the surrogate_learner parameter is now most important, and the factor "ranger" is cleary more important than before.

surrogate_learner bohblrn {.tabset .unlisted .unnumbered}

In the following the surprising outcome of surrogate_learner's factor "bohblrn" is investigated.

bohblrnBest <- bohbBest[bohbBest$surrogate_learner == "bohblrn",]

bohblrnTaskBest <- TaskRegr$new(id = "bohblrnTask", backend = bohblrnBest, target = "yval")

PCP bohblrn

plotParallelCoordinate(bohblrnTaskBest, labelangle = 10)

Importance Plot bohblrn

plotImportance(bohblrnTaskBest)

{.unlisted .unnumbered}

PCP: A high value is probably better for the filter_factor_last parameter. The filter_with_max_budget parameter should be set to "TRUE" and the parameter filter_algorithm should be set to "progressive". It looks like high values for budget_log_step achieve best results. The parameter filter_factor_first should be restricted.

Importance plot: In the general case, the factor "bohblrn" of the survival_fraction parameter was the most important (by far!), which are now the parameters budget_log_step and filter_with_max_budget.

Further, it should be investigated why the survival_fraction parameter lost importance.

{.tabset .unlisted .unnumbered}

bohblrn: Entire Dataset

plotPartialDependence(bohblrnTask, "survival_fraction")

bohblrn: Subdivided Dataset

plotPartialDependence(bohblrnTaskBest, "survival_fraction")

{.unlisted .unnumbered}

In the entire dataset a high value of survival_fraction leads to a drop, but it does not affect the very good results! This case shows that ICE curves can be a useful addition to the PDP.

In the following the other impotant parameters are observed with the PCP and the importance plot for the factor "bohblrn" of surrogate_learner parameter.

{.tabset .unlisted .unnumbered}

bohblrn: PDP budget_log_step

plotPartialDependence(bohblrnTaskBest, "budget_log_step", gridsize = 30, plotICE = FALSE)

bohblrn: PDP filter_with_max_budget

plotPartialDependence(bohblrnTaskBest, "filter_with_max_budget")

bohblrn: PDP filter_factor_last

plotPartialDependence(bohblrnTaskBest, "filter_factor_last", plotICE = FALSE)

bohblrn: PDP filter_algorithm

plotPartialDependence(bohblrnTaskBest, "filter_algorithm")

summary(bohblrnBest$filter_algorithm)
summary(bohblrn$filter_algorithm)

{.unlisted .unnumbered}

In general, the parameter budget_log_step performs better with higher values. There are also little drops around -0.3 to 0.5.

The parameter filter_with_max_budget should be set to "TRUE". There are more observations than in the subset with factor "FALSE". In proportion, more "FALSE" have already been thrown out and therefore this is another indication that "TRUE" is the choice for better yval.

For the parameter filter_factor_last high values could perform best, because even though the differences are small, there are more observations than in other ranges. A good choice for a configuration is over 5.

That the parameter filter_algorithm should be "progressive" can be confirmed. Even though it can not be seen in the partial dependence, a lot of "tournament" got filtered out.

surrogate_learner knn1 {.unlisted .unnumbered}

The surprising outcome of the parameter surrogate_learner's factor "bohblrn" is investigated in the following.

knn1Best <- bohbBest[bohbBest$surrogate_learner == "knn1",]

knn1BestTask <- TaskRegr$new(id = "bohblrnBestTask", backend = knn1Best, target = "yval")

{.tabset .unlisted .unnumbered}

PCP knn1

plotParallelCoordinate(knn1BestTask, labelangle = 10)

Importance Plot knn1

plotImportance(knn1BestTask)

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

PCP: The parameter filter_with_max_budget should be set to "TRUE" and it looks like that there are specific ranges for budget_log_step that achieve better results. Further, the hyperparameter survival_fraction should be set high, while the parameter random_interleave_fraction should be set low for good results. For filter_factor_last high values might be better, as they seem to lead to high yval values. Lastly, the parameter filter_select_per_tournament should be set to 1.

Importance plot: The parameters filter_factor_first, survival_fraction and filter_factor_last are the most important.

Therefore, the most interesting parameters, according to the PCP and the importance plots, are examined.

knn1: PDP filter_factor_first

plotPartialDependence(knn1BestTask, "filter_factor_first", plotICE = FALSE )

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "survival_fraction", plotICE = FALSE)

knn1: PDP filter_factor_last

plotPartialDependence(knn1BestTask, "filter_factor_last", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_with_max_budget")

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "budget_log_step", plotICE = FALSE)

knn1: PDP filter_with_max_budget

plotPartialDependence(knn1BestTask, "filter_select_per_tournament", plotICE = FALSE)

knn1: PDP survival_fraction

plotPartialDependence(knn1BestTask, "random_interleave_fraction", plotICE = FALSE)

{.unlisted .unnumbered}

In general, the parameter filter_factor_first seems to produce better results for low values, preferrably below 4. The value of the parameter survival_fraction should be over 0.5 (This is interesting because in the general case lower values are better!). The hyperparameters filter_factor_last and random_interleave_fraction do not really tell us, where the best configurations are.

surrogate_learner knn7 {.unlisted .unnumbered}

knn7Best <- bohbBest[bohbBest$surrogate_learner == "knn7",]

knn7BestTaskBest <- TaskRegr$new(id = "knn7Task", backend = knn7Best, target = "yval")

{.tabset .unlisted .unnumbered}

PCP knn7

plotParallelCoordinate(knn7BestTaskBest, labelangle = 10)

Importance Plot knn7

plotImportance(knn7BestTaskBest)

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

PCP: The factor of the parameter filter_algorithm should be "tournament", the value for the parameter filter_factor_first should be around 4, the factor of the parameter random_interleave_random should be "FALSE", the value of the parameter survival_fraction seems to need a low value, the parameter filter_with_max_budget should be set to "TRUE", the hyperparameter random_interleave_fraction should have a low value and the parameter filter_select_per_tournament should get a value around 1.

Importance plot: The most important parameters are filter_factor_first, filter_factor_last and budget_log_step.

knn7: PDP filter_factor_first

plotPartialDependence(knn7BestTaskBest, "filter_factor_first", plotICE = FALSE )

knn7: PDP filter_factor_last

plotPartialDependence(knn7BestTaskBest, "filter_factor_last", plotICE = FALSE)

knn7: PDP budget_log_step

plotPartialDependence(knn7BestTaskBest, "budget_log_step", plotICE = FALSE)

knn7: PDP filter_algorithm

plotPartialDependence(knn7BestTaskBest, "filter_algorithm", plotICE = FALSE)

knn7: PDP random_interleave_random

plotPartialDependence(knn7BestTaskBest, "random_interleave_random")

knn7: PDP survival_fraction

plotPartialDependence(knn7BestTaskBest, "survival_fraction", plotICE = FALSE)

knn7: PDP random_interleave_fraction

plotPartialDependence(knn7BestTaskBest, "random_interleave_fraction", plotICE = FALSE)

knn7: PDP filter_select_per_tournament

plotPartialDependence(knn7BestTaskBest, "filter_select_per_tournament", plotICE = FALSE)

knn7: PDP filter_with_max_budget

plotPartialDependence(knn7BestTaskBest, "filter_with_max_budget")

{.unlisted .unnumbered}

The parameter filter_factor_first should be below 4, while the parameter budget_log_step produces best values above 0.5, but does not have a big impact in general. Again, the perfect ranges for filter_factor_last and random_interleave_fraction can not be seen. Also, it can not be confirmed with certainty that "tournament" is always the better choice. Furtherm the parameter random_interleave_random should be "FALSE", the parameter filter_select_per_tournament should be over 0.5 and the parameter filter_with_max_budget should be "TRUE".

surrogate_learner ranger {.unlisted .unnumbered}

Finally, the "ranger" should be investigated since the average performance for good configurations increased a lot.

rangerBest <- bohbBest[bohbBest$surrogate_learner == "ranger",]

rangerBestTaskBest <- TaskRegr$new(id = "rangerBestTask", backend = rangerBest, target = "yval")

{.tabset .unlisted .unnumbered}

PCP ranger

plotParallelCoordinate(rangerBestTaskBest, labelangle = 10)

Importance Plot ranger

plotImportance(rangerBestTaskBest)

{.tabset .unlisted .unnumbered}

PCP: It can be seen that the value of tht parameter budget_log_step should be set high and the parameter filter_with_max_budget should be set to "TRUE".

Importance plot: The most important parameters are filter_factor_first, filter_with_max_budget and budget_log_step.

ranger: PDP survival_fraction

plotPartialDependence(rangerBestTaskBest, "filter_factor_first", plotICE = FALSE)

ranger: PDP budget_log_step

plotPartialDependence(rangerBestTaskBest, "budget_log_step", plotICE = FALSE)

ranger: PDP filter_with_max_budget

plotPartialDependence(rangerBestTaskBest, "filter_with_max_budget", plotICE = FALSE)

While a high value seems to produce the best performance for budget_log_step, a low value seems to produce the best performance for the parameter filter_factor_first. For budget_log_step a value over -0.5 seems to be good and for filter_factor_first a value below 2.5 performs best. It needs to be noticed that only around 45 observations are left, so that the intepretation is not that clear. The parameter filter_with_max_budget should set to "TRUE".

budget_log_step {.tabset }

Another important parameter for the factor "bohb" of the parameter sample is the budget_log_step parameter.

Entire Dataset

plotPartialDependence(bohbTask,"budget_log_step", plotICE = FALSE)

Subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("budget_log_step"), plotICE = FALSE)

{.unlisted .unnumbered}

In general, the value for the parameter budget_log_step should be over -0.5. A high value seems to be a good choice for the subdivided dataset. However, it can also be seen that the parameter varies greatly for the factors "knn1" and "knn7" of the parameter surrogate_learner.

random_interleave_fraction {.tabset}

The parameter random_interleave_fraction can vary between 0 and 1. This parameter had a high performance for the factors "bohb" and "random", even though "random" was slighty more important.

bohb Subset

plotPartialDependence(bohbTask, features = c("random_interleave_fraction"), plotICE = FALSE)

random Subset

plotPartialDependence(randomTask, features = c("random_interleave_fraction"), plotICE = FALSE)

{.unlisted .unnumbered}

For the parameter random_interleave_fraction a good choice is a value in the middle, since they marginal values achieve the worst performances, more precisely, good value are between 0.1 and 0.7. For the "random" factor low values achieve better performances.

Top 20% {.unlisted .unnumbered}

plotPartialDependence(bohbBestTask, features = c("random_interleave_fraction"), plotICE = FALSE)

In the filtered dataset, there is no bad range at the edges.

filter_factor_last {.tabset }

The parameter filter_factor_last was less important.

Entire Dataset

plotPartialDependence(bohbTask, "filter_factor_last", plotICE = FALSE, rug = FALSE)

Subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_factor_last"), plotICE = FALSE)

{.unlisted .unnumbered}

The effect is low and should only be chosen according to the surrogate_learner.

filter_with_max_budget {.tabset }

Entire Dataset

plotPartialDependence(bohbTask, features = c("filter_with_max_budget"), rug = FALSE)

Subdivided Dataset

plotPartialDependence(bohbBestTask, features = c("filter_with_max_budget"), rug = FALSE)

{.unlisted .unnumbered}

The parameter filter_with_max_budget has a weak effect but should be set to "TRUE".

filter_select_per_tournament

The parameter filter_select_per_tournament barely has had an effect in the general case, but become a little more important in the top case. The partial dependence and the dependencies with the most important parameters are checked to get more insight.

{.tabset .unlisted .unnumbered}

PDP filter_select_per_tournament

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament"), plotICE = FALSE)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_first"), rug = FALSE, gridsize = 10)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_select_per_tournament", "filter_factor_last"), rug = FALSE, gridsize = 10)

{.unlisted .unnumbered}

The effect is weak and maybe caused by the peaks around the value 1. The parameter value should probably be 1 or slightly higher but the effect is not that worth mentioning.

filter_factor_first

The parameter filter_factor_first is ranked quite high in the parmameter importance plot for the top configurations.

{.tabset .unlisted .unnumbered}

PDP filter_factor_first

plotPartialDependence(bohbBestTask, features = c("filter_factor_first"), gridsize = 20, plotICE = FALSE)

PDP: Combination with filter_factor_last

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "filter_factor_last"), rug = FALSE, gridsize = 10)

PDP: Combination with survival_fraction

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "survival_fraction"), rug = FALSE, gridsize = 10)

PDP: Combination with budget_log_step

plotPartialDependence(bohbBestTask, features = c("filter_factor_first", "budget_log_step"), rug = FALSE, gridsize = 10)

{.unlisted .unnumbered}

In general, lower values for filter_factor_first achieve slightly better performances, but the differences are small and should not lead to a change in the considerations made.

Dataset: smashy_super

Furthermore, the dataset smashy_super is examined.

Data Preparation

We need to subset the data to compare the entire dataset and the dataset containing only the best 20% of the configurations. In addition, the data must be prepared to facilitate the use of the data for summaries and filters.

Load Data

superSmashy <- readRDS("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/package_VisHyp/data-raw/smashy_super.rds")
superSmashy <- as.data.frame(superSmashy)

n <- length(superSmashy)
for (i in 1:n) {
  if(is.logical(superSmashy[,i]))
    superSmashy[,i] <- as.factor(superSmashy[,i])
  if(is.character(superSmashy[,i]))
    superSmashy[,i] <- as.factor(superSmashy[,i])
}

Create Task

superTask <- TaskRegr$new(id = "superSmashy", backend = superSmashy, target = "yval")
superBest <- superSmashy[superSmashy$yval >= quantile(superSmashy$yval, 0.8),]
superBestTask <- TaskRegr$new(id = "bestTask", backend = superBest, target = "yval")

Results

The target parameter yval can reach values between -0.3732 and -0.2105. Since the goal is to obtain good results, values close to -0.2105 should be found.

On average, the parameter sample performs better with "random" than with "bohb". For the top 20% configurations, many "bohb" factors have been sorted out, but the remaining ones achieve a better performance than the "random" factors. In the end, both factors can lead to good performance values, but "random" is to be preferred, since a lot of the remaining factors are "random".

In general, for the parameter survival_fraction lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. For the top configurations, higher values do not seem to be worse so that with matching configurations of other parameters the value of this parameter can be also high. Although not all high values have poor performances, lower values seem to be the right choice since most good configurations have lower values. A value between 0.05 and 0.30 seems to be a good choice for the factor "knn1" of the surrogate_learner parameter.

The surrogate_learner parameter is one of the most important parameters for the entire dataset. After reducing the dataset to the best 20% of the configurations, it can be seen that the parameter lost importance, since the remaining configurations of surrogate_learner are mainly "knn1". Even though all other factors of the Surrogate_learner achieve better better yval values than the "knn1" factor, it is still choosen, since it performs better on average.

The most important parameter for the best 20% of the configurations is the random_interleave_fraction parameter. In this case, the results are unambiguous, so higher values lead to better results for both the entire dataset and the subset. For this work's purpose, only values above 0.5 are taken.

A similar problem occurs with budget_log_step, as earlier with surrogate_learner. For the entire dataset, higher values achieve better yval values, but for the top 20% of the configurations, lower values are preferred. But unlike surrogate_learner, there are more configurations with good results in the splitted dataset. Also, it is a very important parameter for the top 20% configurations, so it should not be neglected that good performance values can be achieved with lower budget_log_step values. In this case it is better not to limit the parameter.

In the best parameter configurations in combination with "knn1" of the surrogate_learner parameter, the filter_factor_first parameter was the most important parameter. In the entire dataset, this parameter was not important at all. There is also a difference in the range of good configurations. In the entire dataset, values above 6 do not perform well, while in the subdivided dataset, values above 6 produce the best results. Even after subdividing into the best 20% of configurations, the majority of good values are above 4, so it can be said that values above 4 seem to be a good choice for this parameter.

A little more complicated is the interpretation of the parameter filter_factor_last. This parameter has large fluctuations and different good ranges depending on whether the entired or splitted dataset is looked at. Moreover, it can be said that although the importance is high due to the large fluctuations, the range of predicted performances is not that large (which actually refutes the importance). In general, however, one can say that the parameter value for Filter_factor_last should be between 1.5 and 2.5, above 5.5 or at least not be between 4 and 5.

Easy to interpret is the parameter filter_with_max_budget. This parameter is not really important for the entire dataset, but for the best configurations, in combination with "knn1", one can say that "TRUE" should be the choice.

The parameters filter_algorithm, filter_select_per_tournament and random_interleave_random barely have an effect and therefore do not need to be limited.

Data Constraint to Check the Results

To verify the proposed parameter configurations, the dataset is constrained according to the results and the obtain performances are compared with the ranks of the performances of the entire dataset.

superEvaluation <- superSmashy[superSmashy$sample == "random",] 
superEvaluation <- superEvaluation[superEvaluation$survival_fraction > 0.05 & superEvaluation$survival_fraction < 0.3,] 
superEvaluation <- superEvaluation[superEvaluation$surrogate_learner == "knn1",] 
superEvaluation <- superEvaluation[superEvaluation$random_interleave_fraction > 0.5,]
superEvaluation <- superEvaluation[superEvaluation$filter_factor_first > 4,]
superEvaluation <- superEvaluation[superEvaluation$filter_factor_last < 4 | superEvaluation$filter_factor_last > 5,]
superEvaluation <- superEvaluation[superEvaluation$filter_with_max_budget == "TRUE",]

superYval <- sort(superEvaluation$yval, decreasing = TRUE)
superYvalOriginal <- sort(superSmashy$yval, decreasing = TRUE)
sort(match(superYval, superYvalOriginal), decreasing = FALSE)

It can be seen that many good results are obtained, but not nearly all of the best configurations are found. This can be explained by the fact that constraints are often imposed to reduce the size of the dataset. For example, for some categorical parameters, always one factor is chosen even though other categories could also yield good values. Furthermore, numerical parameters were partly restricted, although it is known that for some very good configurations, very good yval values can also be obtained outside the range. In the end, however, it can be shown that the ranges are restricted to lead to almost exclusively above-average or good performance values. Finally, the quality criteria are calculated again. The importance of the quality criteria can be found in the bachelors thesis.

#summary
summary(superSmashy$yval)
#proportion
length(superYval)/length(superSmashy$yval)
#top congifuration
sum(superYval >= quantile(superSmashy$yval, 0.95))/length(superYval)
sum(superYval >= quantile(superSmashy$yval, 0.8))/length(superYval)
#quantile
sum(superSmashy$yval<=max(superYval))/length(superSmashy$yval)

Visual Overview {.tabset}

The results of the quality criteria can be visually checked with the implemented PCP. For a better overview, the color range is restricted, since there are only a few observations below -0.3. For a better comparison, the presumed good ranges and the presumed worse configuration ranges of the parameters are shown.

Limitation to Very Good Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/Super_Best_PCP.png")

Limitation to Bad Configurations

knitr::include_graphics("D:/Simon/Desktop/Studium/6. Semester/Bachelorarbeit/Latex/Grafiken/Super_Bad_PCP.png")

{.unlisted .unnumbered}

Overview {.tabset}

An overview is obtained again.

Head

head(superSmashy)

Structure

str(superSmashy)

{.tabset .unlisted .unnumbered}

It shall be looked at the importance for the entire dataset (general case) and for the best configurations (top 20%).

Importance General

plotImportance(task = superTask)

Importance Best

plotImportance(task = superBestTask)

{.tabset .unlisted .unnumbered}

For the entire dataset surrogate_learner is the most and sample the second most important hyperparameter. After filtering the dataset, both parameters are less important, so random_interleave_fraction becomes the most important parameter. Parameters like filter_algorithm, random_interleave_random and filter_with_max_budget have no effect on the full dataset nor on the filtered dataset.

After the data is subdivided, also structural changes are observed in the summary.

Summary All

summary(superSmashy)

Summary Best 20% {.unlisted .unnumbered}

summary(superBest)

{.unlisted .unnumbered}

These summary already explains why the parameter surrogate_learner lost most of its importance. The factors "bohblrn", "knn7" and "rangers" are kicked out quite often. This means that these learner perform worse than the "knn1" learner on average. With the parameter filter_with_max_budget a disproportionate number of configurations with "FALSE" are filtered out. This means that configurations with the factor "TRUE" perform better on average. It is also noted that the summary values of survival_fraction have decreased and increased for budget_log_step , Filter_factor_first and random_interleave_fraction. Finally, a disproportionate number of "bohb" is also dropped out of the dataset. Perhaps this is an indication that "random" gives better results.

The hyperparameter will be examined in following sections more precise.

Examination of the Parameters

Sample { .tabset}

As could be find out, sample is an important parameter again for the entire dataset and can take the factor "bohb" as well as "random". This parameter should have the right value for a good performance. Therefore, the effects of the parameter in a partial dependence plot are considered and it is checked if the effect applies to all parameters. A heatmap is used to obtain a quick overview of the interactions. Values close to 1 barely have an effect on the outcome.

PDP

plotPartialDependence(superTask, features = c("sample"), rug = FALSE, plotICE = FALSE)

Heatmap

subplot(
plotHeatmap(superTask, features = c("sample", "budget_log_step"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "survival_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "surrogate_learner"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_with_max_budget"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_first"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_fraction"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "random_interleave_random"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_factor_last"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_algorithm"), rug = FALSE),
plotHeatmap(superTask, features = c("sample", "filter_select_per_tournament"), rug = FALSE),
nrows = 5,shareX = TRUE)

{.unlisted .unnumbered}

In the PDP, it can be seen that, on average, the values for "random" lead to better results than for "bohb". In the heatmaps, it can be seen that the predicted performances may be better when filter_with_max_budget is set to "TRUE", random_interleave_fraction is given a high value and survival_fraction is given a low value. As suspected, according to the results of the summary, the factor "knn1" leads to the best results on average.

Top 20% {.unlisted .unnumbered}

The data can be splitted according to the best 20% yval values of the dataset and checked if the outcome of a PDP is different.

plotPartialDependence(superBestTask, features = c("sample"), rug = TRUE, plotICE = TRUE)

A lot of configurations of "bohb" factor are sorted out, but the remaining ones have a better performance on average than configurations of the "random" factor. Furthermore, differences are assumed between "random" and "bohb", since the parameter samples has lost much of its importance after filtering. Therefore we split the dataset into "bohb" and "random".

superRandom <- superSmashy[superSmashy$sample == "random",]
superBohb <- superSmashy[superSmashy$sample == "bohb",]

superRandomTask <- TaskRegr$new(id = "task_superRandom", backend = superRandom, target = "yval")
superBohbTask <- TaskRegr$new(id = "task_superBohb", backend = superBohb, target = "yval")

In the following the differences regarding the importance of the parameters are checked for the "random" and the "bohb" subset.

{.tabset .unlisted .unnumbered}

Subset bohb

plotImportance(task = superBohbTask)

Subset random

plotImportance(task = superRandomTask)

{.unlisted .unnumbered}

The hyperparameters surrogate_learner and random_interleave_fraction are still the most important parameters for both constrained datasets. In fact, the importance did not change a lot.

There is a little difference between the two factors of the sample parameter for the entire dataset. It can be found that the majority of the good results are obtained with "random", but for further analysis we will look at both the "random" and the "bohb" subset.

survival_fraction {.tabset}

According to the importance plot the survival rate parameter is a moderately important parameter for both factors of the parameter samples considering the entire dataset. Based on the summary it is assumed that low values may lead to better performances. This parameter can take values between 0.00007 and 0.9998.

Subset bohb

plotPartialDependence(superBohbTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

Subset random

plotPartialDependence(superRandomTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

In general, lower values perform better than higher values. Both subsets start with a low value and reach their maximum value directly afterwards. This means that the value should probably be low, but not minimal. For both subsets, the best range seems to be between 0.05 and 0.25. While the performance values for "random" are almost monotonly decreasing the "bohb" samples has a peak between 0.5 and 0.75.

Top 20% {.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

A possibility to analyze the structure is to filter the dataset again. For this the data is splitted according to the best 20% yval values of the factor "bohb".

superBohbBest <- superBohb[superBohb$yval >= quantile(superBohb$yval, 0.8),]
superBohbBestTask <- TaskRegr$new(id = "superBohbBestTask", backend = superBohbBest, target = "yval")

superRandomBest <- superBohb[superBohb$yval >= quantile(superBohb$yval, 0.8),]
superRandomBestTask <- TaskRegr$new(id = "superRandomBestTask", backend = superBohbBest, target = "yval")

bohb Best

plotPartialDependence(superBohbBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

random Best

plotPartialDependence(superRandomBestTask, features = c("survival_fraction"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

In this case, higher values do not seem to be worse. This is surprising, since low values are generally more important. It means that if the other parameters have good configurations, the survival_fraction parameter gives even better results if a high value is chosen. This could also explain the increase in the range between 0.5 and 0.75 for the factor "bohb". By looking at the rug, it can be seen that most configurations are made below 0.5 and the fewest configurations are made above 0.75. Because of the few configurations with quite high values, the effect of good performances is less strong in this range. In the range between 0.5 and 0.75, there are more configurations, which therefore have a greater impact on the curve. Although, not all high values have poor performances, lower values should be preferred since most of the good configurations have lower values.

surrogate_learner {.tabset}

A very important parameter for the "bohb" subset is the surrogate_learner.

Subset bohb

plotPartialDependence(superBohbTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

Subset bohb

plotPartialDependence(superRandomTask, features = c("surrogate_learner"), rug = FALSE, plotICE = FALSE)

{.tabset .unlisted .unnumbered}

In both subsets, "knn1" is actually the best choice based on the PDP even though the factors do not differ that much. For a more detailed analysis, the data is splitted into the individual factors of the surrogate learners. Although it would be interesting to analyze the learners for both factors of the parameter sample separately, the focus lays on the whole dataset to make it less complicated and because the importance of the subsets does not differ that much.

superKnn1 <- superSmashy[superSmashy$surrogate_learner == "knn1",] 
superKnn7 <- superSmashy[superSmashy$surrogate_learner == "knn7",] 
superBohblrn <- superSmashy[superSmashy$surrogate_learner == "bohblrn",]
superRanger <- superSmashy[superSmashy$surrogate_learner == "ranger",]

superKnn1Task <- TaskRegr$new(id = "knn1Task", backend = superKnn1, target = "yval")
superKnn7Task <- TaskRegr$new(id = "knn7task", backend = superKnn7, target = "yval")
superBohblrnTask <- TaskRegr$new(id = "bohblrnTask", backend = superBohblrn, target = "yval")
superRangerTask <- TaskRegr$new(id = "rabgerTask", backend = superRanger, target = "yval")

Subset: knn1

plotImportance(superKnn1Task)

Subset: knn7

plotImportance(superKnn7Task)

Subset: bohblrn

plotImportance(superBohblrnTask)

Subset: ranger

plotImportance(superRangerTask)

{.unlisted .unnumbered}

The parameters sample and random_interleave_fraction are the most important parameters for the factor "knn1", "knn7" and "ranger." For the "bohblrn" factor the parameter survival_fraction is more important than random_interleave_fraction. The parameter filter_with_max_budget barely has an effect for all factors but for "knn1" of the surrogate_learner.

{.tabset .unlisted .unnumbered}

The most important parameter for nearly all surrogate_learner is the sample parameter.

knn1: sample

plotPartialDependence(superKnn1Task, "sample", rug = FALSE, )

knn7: sample

plotPartialDependence(superKnn7Task, "sample", rug = FALSE)

bohblrn: sample

plotPartialDependence(superBohblrnTask, "sample", rug = FALSE)

ranger: sample

plotPartialDependence(superRangerTask, "sample", rug = FALSE)

{.tabset .unlisted .unnumbered}

As already known the "random" performs better on average but also for all factors of the surrogate_learner.

knn1: random_interleave_fraction

plotPartialDependence(superKnn1Task, "random_interleave_fraction", plotICE = FALSE)

knn7: random_interleave_fraction

plotPartialDependence(superKnn7Task, "random_interleave_fraction", plotICE = FALSE)

bohblrn: random_interleave_fraction

plotPartialDependence(superBohblrnTask, "random_interleave_fraction", plotICE = FALSE)

ranger: random_interleave_fraction

plotPartialDependence(superRangerTask, "random_interleave_fraction", plotICE = FALSE)

{.unlisted .unnumbered}

For the parameter random_interleave_fraction higher values are always better. For "knn1" and "knn7", low random_interleave_fraction values have a stronger negative impact on the predictions than low values for "ranger" and "bohblrn". For the surrogate_learner's "knn1" and "bohblrn", the maximum results in slightly worse predicted performance, but since there are few instances, this is not certain. Values between 0.75 and 0.95 can be considered as optimal values for the parameter.

{.tabset .unlisted .unnumbered}

Another important parameter for all factors of the surrogate_learner is the survival_fraction parameter. Also, for the "bohblrn" the parameter survival_fraction was noticeably more important than for other learners.

knn1: survival_fraction

plotPartialDependence(superKnn1Task, "survival_fraction", plotICE = FALSE)

knn7: survival_fraction

plotPartialDependence(superKnn7Task, "survival_fraction", plotICE = FALSE)

bohblrn: survival_fraction

plotPartialDependence(superBohblrnTask, "survival_fraction", plotICE = FALSE)

ranger: survival_fraction

plotPartialDependence(superRangerTask, "survival_fraction", plotICE = FALSE)

{.tabset .unlisted .unnumbered}

Low values for the parameter survival_fraction are better in general for the learners "knn1", "knn7". For "knn1" a value close to 0 and for "knn7" a values between 0.05 and 0.15 should be considered. For "bohblrn" values around 0.25 and 0.35 and for "ranger" values around 0.15 and 0.25 produce best predicted performances.

The last parameter which needs to be checked is filter_with_max_budget. This parameter was only important for "knn1".

knn1: filter_with_max_budget

plotPartialDependence(superKnn1Task, "filter_with_max_budget", plotICE = FALSE)

knn7: filter_with_max_budget

plotPartialDependence(superKnn7Task, "filter_with_max_budget", plotICE = FALSE)

bohblrn: filter_with_max_budget

plotPartialDependence(superBohblrnTask, "filter_with_max_budget", plotICE = FALSE)

ranger: filter_with_max_budget

plotPartialDependence(superRangerTask, "filter_with_max_budget", plotICE = FALSE)

By comparing the importance of the factors of the surrogate_learner it can be seen that the filter_with_max_budget parameter is only important for "knn1" and should be set to "TRUE".

{.unlisted .unnumbered}

Top 20% {.unlisted .unnumbered}

The comparison of the summary of the entire dataset with the dataset with the top 20% configurations shows that both, "random" and "bohb" are left. It can also be seen that mostly "knn1" learners are left.

summary(superBest$surrogate_learner)

aggregate(x = superBest$yval,                
          by = list(superBest$surrogate_learner),              
          FUN = max)

It is interesting to see that the best configurations of each learner, filtered out in large numbers, achieve a better yval than for the "knn1" learner. This is important because it shows that it is indeed possible to achieve good results with all learners and not only with "knn1". Nevertheless, "knn1" achieves the best results on average, which means that this learner is more robust and changes in the configurations compared to the other learners do not have such a large negative impact on the performances.

surrogate_learner knn1 {.unlisted .unnumbered}

In the following "knn1" is investigated. Since there is only a few data left, it is possible to make use of a PCP.

superKnn1Best <- superBohbBest[superBohbBest$surrogate_learner == "knn1",]

superKnn1BestTask <- TaskRegr$new(id = "task", backend = superKnn1Best, target = "yval")

{.tabset .unlisted .unnumbered}

PCP knn1

plotParallelCoordinate(superKnn1BestTask, labelangle = 10)

Importance Plot knn1

plotImportance(superKnn1BestTask)

{.tabset .unlisted .unnumbered}

In the PCP it can be seen that filter_with_max_budget should be set to "TRUE", random_interleave_random to "FALSE" and random_interleave_fraction should be high for good results.

According to the importance plot the paramters filter_factor_first and filter_factor_last are very important as well and should be further examined.

knn1: PDP filter_factor_first

plotPartialDependence(superKnn1BestTask, "filter_factor_first", plotICE = FALSE)

knn1: Importance filter_factor_last

plotPartialDependence(superKnn1BestTask, "filter_factor_last", plotICE = FALSE)

{.unlisted .unnumbered}

In the PDP we can see that the values should be heigh for the parameter filter_factor_first and lay between 1.5 and 2.5 or above 6 for the parameter fitler_factor_last.

budget_log_step {.tabset }

Another very important parameter for the "random" subset and for the filtered dataset is the budget_log_step parameter.

Subset bohb

plotPartialDependence(superBohbTask, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)

Subset random

plotPartialDependence(superRandomTask, features = c("budget_log_step"), rug = FALSE, plotICE = FALSE)

{.unlisted .unnumbered}

For the "random" subset higher values produce better outcomes. For the superBohbTask two peaks can be seen around -0.5 and 0.5. In order to find reasons for the two peaks, the top 20% are observed again.

top 20 % {.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

bohb Best

plotPartialDependence(superBohbBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = FALSE)

random Best

plotPartialDependence(superRandomBestTask, features = c("budget_log_step"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

Similar to the survival_fraction parameter, configurations with a low value rather have a positive than negative effect on the performance if the other parameters are set correctly and could be a reason for the two peaks of the factor "bohb".

If low values are considered, then it can be said that the predicted performance varies greatly and that other parameter configurations are responsible. We choose budget_log_step values below -1.4 to get less than 150 configurations.

budgetSubset <- superRandom[superRandom$budget_log_step < -1.4,]

budgetSubsetTask <- TaskRegr$new(id = "superBohbBestTask", backend = budgetSubset, target = "yval")

plotParallelCoordinate(budgetSubsetTask, labelangle = 10)

In the PCP it can see that good values are often obtained with a "knn1" learner. A low survival_fraction is also important. The random_interleave_fraction parameter should be high instead.

{.tabset .unlisted .unnumbered}

Another option is to look at a PDP. The values of the parameter budget_log_step are compared with the 3 parameters found in the PCP.

survival_fraction

plotPartialDependence(superRandomTask, features = c("budget_log_step", "survival_fraction"), rug = FALSE, gridsize = 10)

random_interleave_fraction

plotPartialDependence(superRandomTask, features = c("budget_log_step", "random_interleave_fraction"), rug = FALSE, gridsize = 10)

random_interleave_fraction

plotPartialDependence(superRandomTask, features = c("surrogate_learner", "random_interleave_fraction"), rug = FALSE, gridsize = 10)

{.unlisted .unnumbered}

It can be seen that high values have less poor performances when other parameters are also poorly configured. Conversely, it is also possible to achieve good values when budget_log_step is low and the other parameters are well configured. It can be also said that the factor "knn1" of the parameter surrogate_learner achieve best performances on average.

random_interleave_fraction {.tabset}

The parameter random_interleave_fraction can vary between 0 and 1. This parameter had a high performance in both subsets and was also the most important parameter for the best 20% of the configurations. Therefore it is really useful to check this parameter.

bohb Subset

plotPartialDependence(superBohbTask, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)

random Subset

plotPartialDependence(superRandomTask, features = c("random_interleave_fraction"), rug = FALSE, plotICE = FALSE)

{.unlisted .unnumbered}

A good choice for the parameter configuration for random_interleave_fraction in combination with the factor "bohb" is a high value. A good range seems to be between 0.75 and 0.95. For the "random" factor a high value between 0.5 and 0.75 seems to produce best performances.

top 20% {.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

Entire Dateset {.unlisted .unnumbered}

plotPartialDependence(superBohbBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20, plotICE = FALSE)

Top 20% {.unlisted .unnumbered}

plotPartialDependence(superRandomBestTask, features = c("random_interleave_fraction"), rug = FALSE, gridsize = 20, plotICE = FALSE)

{.unlisted .unnumbered}

The filtered dataset shows that low values doesn't have such a bad negative impact on the outcome but high values are better. A value should be chosen over 0.5

filter_factor_last {.tabset }

The parameter filter_factor_last was just medicore important.

Bohb: Entire Dataset

plotPartialDependence(superBohbTask, "filter_factor_last", plotICE = FALSE)

bohb: Dubdivided Dataset

plotPartialDependence(superBohbBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE)

random: Entire Dataset

plotPartialDependence(superRandomTask, "filter_factor_last", plotICE = FALSE)

random: Subdivided Dataset

plotPartialDependence(superRandomBestTask, features = c("filter_factor_last"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

The paramter filter_factor_last has much fluctuation and therefore we choose a higher gridsize. When the fluctuations raise the importance raises as well even the range of predicted performances is not really big. the parameter value for filter_factor_last should be between 1.5 and 2.5 or For "bohb" over 5.5 and for "random" between 5 and 5.5.

filter_with_max_budget {.tabset }

Bohb: full dataset

plotPartialDependence(superBohbTask, "filter_with_max_budget", rug = FALSE, plotICE = FALSE)

bohb: subdivided dataset

plotPartialDependence(superBohbBestTask, features = c("filter_with_max_budget"), rug = FALSE, plotICE = FALSE)

random: full dataset

plotPartialDependence(superRandomTask, "filter_with_max_budget", rug = FALSE, plotICE = FALSE)

random: subdivided dataset

plotPartialDependence(superRandomTask, features = c("filter_with_max_budget"), rug = FALSE, plotICE = FALSE)

{.unlisted .unnumbered}

The parameter filter_with_max_budget has a weak effect but should be set to "TRUE".

filter_factor_first

This parameter has barely an effect in the general case but got a little more important in the best case.

{.tabset .unlisted .unnumbered}

Bohb: full dataset

plotPartialDependence(superBohbTask, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)

Bohb: subdivided dataset

plotPartialDependence(superBohbBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)

random: full dataset

plotPartialDependence(superRandomTask, features = c("filter_factor_first"), rug = FALSE, plotICE = FALSE)

random: subdivided dataset

plotPartialDependence(superRandomBestTask, features = c("filter_factor_first"), rug = TRUE, plotICE = FALSE)

{.unlisted .unnumbered}

The parameter filter_factor_first shows interesting differences between the general and the best case. While in the general cases values above 6 are decreasing a lot in the subset these values show best performances. Since in the subset the majority of good cases are in this range it seems to be a good choice to pick a value over 6.

Comparison of the Two Datasets

Let us compare the results of the parameters from the two datasets

sample: The sample parameter was very important for both datasets. For the smashy_lcbench dataset the factor should be "bohb" in any case and for the smashy_super dataset one can get good performances with "bohb" as well as with "random".

survival_fraction: In the smashy_lcbench dataset the parameter survival_fraction should be chosen according to the selected surrogate_learner. This distinction was made because good values could be achieved with all learners. In particular, for the "knn1" learner, which was also chosen for the smashy_super, all values should be considered. For the whole dataset values below 0.5 achieved better results on average and for the best configurations it hardly matters. For the smashy_super dataset a low value between 0 and 0.3 is a good choice in general.

surrogate_learner: In the smashy_lcbench dataset, the surrogate_learner parameter was not particularly important, but influenced other parameters depending on the factor selected. Basically, "knn1" and "knn7" achieved the best performance values on average, but when considering only the best configurations, the surrogate_learner "bohb" achieved the best performancs. For the smashy_super dataset, the parameter was very important and achieved most of the good results with "knn1" and should be the choice. However, it should also be noted that good values could be achieved with all surrogate_learner.

A very important parameter for both datasets was random_interleave_fraction. For the smashy_lcbench dataset the configuration depended on the surrogate_learner again while for the smashy_super dataset higher values led to better results.

In the smashy_lcbench dataset, a very important parameter for the factor "bohb" was the budget_log_step parameter. This parameter should be set according to the surrogate_learner but for "knn1" a value between -0.5 and 0.5 should be the right choice. It needs to be mentioned that this parameter had repeated dips for "knn1" and "knn7" in the analyses so it was hard to choose the right factor. For the smashy_super dataset higher values were better, but in the top 20% of configurations, lower values achieved better yval values. In this case we chose not to limit this parameter.

For the lcbench dataset, the filter_factor_first parameter was the most important parameter for the 20% of best configurations. In general, it can be said that values below 4 provide the best performances. An exception is the factor "bohblrn" of the parmater surrogate_learner. Here, no restriction should be made. For the smashy_super dataset, the parameter for the best parameter configurations in combination with the "knn1" factor of the surrogate_learner parameter is the most important parameter. For this dataset, values above 4 seem to be a good choice.

The filter_factor_last parameter is not really important for the smashy_lcbench dataset. The effect is small and generally should not be used to subdivide the dataset. For the smashy_super dataset, the filter_factor_last parameter is very important for the top configurations, but this was due to high fluctuations. It is difficult to restrict the parameter, but values between 4 and 5 should be included.

Easy to set is the filter_with_max_budget parameter. This parameter should always be "TRUE" for both datasets.

Also, the parameters filter_algorithm, filter_select_per_tournament and random_interleave_random have barely an effect and therefore do not need to be limited.

Pizzaknoedel/visualize-hyperparameter documentation built on Feb. 13, 2022, 8:11 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Pizzaknoedel/visualize-hyperparameter Visualizing Hyperparamer Performance Dependencies

In Pizzaknoedel/visualize-hyperparameter: Visualizing Hyperparamer Performance Dependencies

Introduction

Idea

Structure and Outline

Dataset: smashy_lcbench

Data Preparation

Load Data

Create Task

Results

Data Constraint to Check the Results

Visual Overview {.tabset}

Limitation to Good Configurations

Limitation to Bad Configurations

{.unlisted .unnumbered}

Overview {.tabset}

Head

Structure

{.tabset .unlisted .unnumbered}

Importance General

Importance Best

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

Summary All

Summary Best 20% {.unlisted .unnumbered}

{.unlisted .unnumbered}

Examination of the Parameters

sample { .tabset}

PDP

Heatmap

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

Subset bohb

Subset random

{.unlisted .unnumbered}

Top 20% {.unlisted .unnumbered}

survival_fraction {.tabset}

Subset bohb

Subset random

{.unlisted .unnumbered}

Top 20% {.unlisted .unnumbered}

surrogate_learner {.tabset}

Subset: knn1

Subset: knn7

Subset: bohblrn

Subset: ranger

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

knn1: random_interleave_fraction

knn7: random_interleave_fraction

bohblrn: random_interleave_fraction

ranger: random_interleave_fraction

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

knn1: survival_fraction

knn7: survival_fraction

bohblrn: survival_fraction

ranger: survival_fraction

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

knn1: budget_log_step

knn7: budget_log_step

bohblrn: budget_log_step

ranger: budget_log_step

{.unlisted .unnumbered}

Top 20% {.unlisted .unnumbered}

surrogate_learner bohblrn {.tabset .unlisted .unnumbered}

PCP bohblrn

Importance Plot bohblrn

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

bohblrn: Entire Dataset

bohblrn: Subdivided Dataset

{.unlisted .unnumbered}

{.tabset .unlisted .unnumbered}

bohblrn: PDP budget_log_step

bohblrn: PDP filter_with_max_budget

bohblrn: PDP filter_factor_last

bohblrn: PDP filter_algorithm

{.unlisted .unnumbered}

Pizzaknoedel/visualize-hyperparameter
Visualizing Hyperparamer Performance Dependencies