README.md

gaht

Genetic Algorithm for Hyper-parameter Tuning.

Description

This is a genetic algorthm built in R designed for hyper-parameter tuning for neural networks.

I've decided to open source it, as I don't think I can improve it further.

It takes in a population and randomly initializes the hyper-parameters, then it tests all agents in the population and takes the top ranked agents.

Then it randomly couples 2 agents together as father and mother, and inherit hyper-parameters from the parents as traits for the next generation with a random chance of mutation.

This process repeats and guides the hyper-parameter towards a direction that produces higher ranking agents.

This does not encode DNAs or genes or whatnots, as I'm not famililiar with those things. So, feel free to correct me if I'm doing this wrong.

How to Use

Dependencies

Build Package

Open rPkgEvlAlg.Rproj in RStudio, and go to Build > Clean and Rebuild. This will build the package and make it available in your local computer. I'm not sure about other IDEs, but they should have something similar.

Load Package

After building the package, you need a new session to be able to load the package. Assuming you are using a new session, use the following code to load the package:

library(rPkgEvlAlg)

Using the Function

To start the algorithm, call:

evolve(ftr_settings, test, dupGens = 4, pop = NULL, popSize = NULL, maxGens)  

params

returns

Example

Example for the ftr_settings data frame:

settings <- data.frame(
  name =           c('filter1',   'filter2', 'numLayers_dense', 'numUnits1', 'dropRate1', 'numUnits2',       'dropRate2',       'numUnits3',       'dropRate3'),
  min =            c(300,         200,        3,                 200,        .8,           200,              .5,                200,               0),
  max =            c(300,         200,        3,                 200,        .8,           200,              .5,                200,               1),
  type =           c('integer',   'integer',  'integer',         'integer',  'numeric',   'integer',         'numeric',         'integer',         'numeric'),
  dependency =     c(NA,          NA,         NA,                NA,         NA,          'numLayers_dense', 'numLayers_dense', 'numLayers_dense', 'numLayers_dense'),
  dependency_min = c(NA,          NA,         NA,                NA,         NA,          2,                 2,                 3,                 3)
)

To temporarily lock a hyper-parameter, set the min value and max value to the same value. This means that the locked hyper-parameter will not evolve by the algorithm.

To permanantly lock a hyper-parameter, don't add that hyper-parameter into the ftr_settings data frame, and just hard code it into the test function.

Unfortunately, if you have 50 layers, you will need to create settings for all 50 layers in the data frame. You could use a loop in that case, rather than manually. The reason for this is that so 50 layers hyper-parameters can evolve independent of each other. This also allows you to have 50 different settings for all 50 layers. Although, it's very unlikely you will need 50 different settings. Most likely, you will have the same settings for most layers. Use a loop to create the settings data frame in that case.

Log File

A log file called 'evlAlg_log.csv' will be output in your project directory that uses this package, and it's updated every time an agent in the population is tested.

It's a csv file that, when imported, forms a table or dataframe. Each row represents a test subject / agent. It tells you some information about the generation and helps you understand why the evolution hasn't stopped, information about the test subject, as well as the test result.

You can then analyze these test subjects and see if you can spot a definitive pattern / correlation between the test subjects and test results. If you can find a hyper-parameter setting that consistently generates good result, then you don't have to evolve that hyper-parameter anymore, or, at lease, narrow the evolvable range down significantly.

Caveats

Obviously, this is a very brute force way of testing hyper-parameters, because you train the neural network for the population size for the number of generations, which can be very expensive. Especially considering that a large population is recommended.

I like, however, the fact that neural network mimics the human brain, and genetic algorithms mimic the evolution of brains.

License

GNU GPLv3



MinglunZhu/gaht documentation built on July 9, 2020, 8:11 p.m.