GenAlgControl-constructor: Set control arguments for the genetic algorithm

genAlgControlR Documentation

Set control arguments for the genetic algorithm

Description

The population must be large enough to allow the algorithm to explore the whole solution space. If the initial population is not diverse enough, the chance to find the global optimum is very small. Thus the more variables to choose from, the larger the population has to be.

Usage

genAlgControl(
  populationSize,
  numGenerations,
  minVariables,
  maxVariables,
  elitism = 10L,
  mutationProbability = 0.01,
  crossover = c("single", "random"),
  maxDuplicateEliminationTries = 0L,
  verbosity = 0L,
  badSolutionThreshold = 2,
  fitnessScaling = c("none", "exp")
)

Arguments

populationSize

The number of "chromosomes" in the population (between 1 and 2^16)

numGenerations

The number of generations to produce (between 1 and 2^16)

minVariables

The minimum number of variables in the variable subset (between 0 and p - 1 where p is the total number of variables)

maxVariables

The maximum number of variables in the variable subset (between 1 and p, and greater than minVariables)

elitism

The number of absolute best chromosomes to keep across all generations (between 1 and min(populationSize * numGenerations, 2^16))

mutationProbability

The probability of mutation (between 0 and 1)

crossover

The crossover type to use during mating (see details). Partial matching is performed

maxDuplicateEliminationTries

The maximum number of tries to eliminate duplicates (a value of 0 or NULL means that no checks for duplicates are done.

verbosity

The level of verbosity. 0 means no output at all, 2 is very verbose.

badSolutionThreshold

The worst child must not be more than badSolutionThreshold times worse than the worse parent. If less than 0, the child must be even better than the worst parent. If the algorithm can't find a better child in a long time it issues a warning and uses the last found child to continue.

fitnessScaling

How the fitness values are internally scaled before the selection probabilities are assigned to the chromosomes. See the details for possible values and their meaning.

Details

The initial population is generated randomly. Every chromosome uses between minVariables and maxVariables (uniformly distributed).

If the mutation probability (mutationProbability is greater than 0, a random number of variables is added/removed according to a truncated geometric distribution to each offspring-chromosome. The resulting distribution of the total number of variables in the subset is not uniform anymore, but almost (the smaller the mutation probability, the more "uniform" the distribution). This should not be a problem for most applications.

The user can choose between single and random crossover for the mating process. If single crossover is used, a single position is randomly chosen that marks the position to split both parent chromosomes. The child chromosomes are than the concatenated chromosomes from the 1st part of the 1st parent and the 2nd part of the 2nd parent resp. the 2nd part of the 1st parent and the 1st part of the 2nd parent. Random crossover is that a random number of random positions are drawn and these positions are transferred from one parent to the other in order to generate the children.

Elitism is a method of enhancing the GA by keeping track of very good solutions. The parameter elitism specifies how many "very good" solutions should be kept.

Before the selection probabilities are determined, the fitness values f of the chromosomes are standardized to the z-scores (z = (f - mu) / sd). Scaling the fitness values afterwards with the exponential function can help the algorithm to faster find good solutions. When setting fitnessScaling to "exp", the (standardized) fitness z will be scaled by exp(z). This promotes good solutions to get an even higher selection probability, while bad solutions will get an even lower selection probability.

Value

An object of type GenAlgControl

Examples

ctrl <- genAlgControl(populationSize = 100, numGenerations = 15, minVariables = 5,
    maxVariables = 12, verbosity = 1)

evaluatorSRCV <- evaluatorPLS(numReplications = 2, innerSegments = 7, testSetSize = 0.4,
    numThreads = 1)

evaluatorRDCV <- evaluatorPLS(numReplications = 2, innerSegments = 5, outerSegments = 3,
    numThreads = 1)

# Generate demo-data
set.seed(12345)
X <- matrix(rnorm(10000, sd = 1:5), ncol = 50, byrow = TRUE)
y <- drop(-1.2 + rowSums(X[, seq(1, 43, length = 8)]) + rnorm(nrow(X), 1.5));

resultSRCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorSRCV, seed = 123)
resultRDCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorRDCV, seed = 123)

subsets(resultSRCV, 1:5)
subsets(resultRDCV, 1:5)

gaselect documentation built on Feb. 16, 2023, 6:14 p.m.