FindOptimalSubset: Find Optimal Subset Using a GA

View source: R/FindOptimalSubset.R

FindOptimalSubsetR Documentation

Find Optimal Subset Using a GA

Description

Find optimal subset of a fixed size k from a finite sequence of length n. A distributed multiple-population genetic algorithm (GA) is used to do subset selection based on the maximization of a user-supplied fitness function.

Usage

FindOptimalSubset(
  n,
  k,
  Fitness,
  ...,
  popSize = 100,
  numIslands = 4,
  migrationRate = 0.1,
  migrationInterval = 10,
  pcrossover = 0.8,
  pmutation = 0.1,
  elitism = max(1, round(popSize/numIslands * 0.05)),
  maxiter = 1000,
  run = maxiter,
  suggestions = NULL,
  parallel = TRUE,
  monitor = NULL,
  seed = NULL
)

Arguments

n

'integer' count. Maximum permissible index, that is, the length of the finite sequence (1:n). The GA chooses a subset from this sequence.

k

'integer' count. Number of indices to choose, that is, the fixed size of the subset.

Fitness

'function'. Fitness function, also known as the objective function, is any allowable R function which takes as its first argument the binary string representing a potential solution. And as its second argument the maximum permissible index, n. Use the DecodeChromosome(string, n) command to decode the binary string. The fitness function returns a single numerical value describing its fitness. Recall that the GA searches for a maximum fitness value.

...

Additional arguments to be passed to the fitness function.

popSize

'integer' count. Population size that is distributed evenly between islands.

numIslands

'integer' count. Number of islands

migrationRate

'numeric' number. Proportion of individuals that should migrate between islands.

migrationInterval

'integer' count. Number of generations at which exchange of individuals (or migration) takes place. This interval between migrations is called an epoch.

pcrossover

'numeric' number. Probability of crossover between pairs of chromosomes.

pmutation

'numeric' number. Probability of mutation in a parent chromosome.

elitism

'integer' count. Number of chromosomes to survive into the next generation. Defaults to 5-percent of the island population.

maxiter

'integer' count. Maximum number of generations to run on each island before the GA search is halted.

run

'integer' count. Number of consecutive generations without any improvement in the “best” fitness value before the GA is stopped.

suggestions

integer 'matrix'. Integer chromosomes to be included in the initial population. See returned solution component for a suggested value for this argument.

parallel

'logical' flag or 'integer' count. Whether to use parallel computing. This argument can also be used to specify the number of cores to employ; by default, this is the number of physical CPUs/cores. The parallel and doParallel packages must be installed for parallel computing to work.

monitor

'function'. Function that takes as input the current state of the gaisl-class object, and is run at each epoch of the islands GA search.

seed

'integer' count. Random number generator state for random number generation, used to replicate the results. The doRNG package must be installed if using parallel computing.

Details

The fitness function (see Fitness argument) is solved using the gaisl function in the GA package (Scrucca, 2013, 2016). The function implements an islands evolution model (first proposed by Cohoon and others, 1987). to maximize a fitness function using islands parallel genetic algorithms (ISLPGAs) (Luke, 2013, p. 103-104; Scrucca, 2016, p. 197-200). Independent GAs are configured to use integer chromosomes represented with a binary codification, linear-rank selection, uniform crossover, and uniform mutation.

Value

A 'list' with components:

call

original call which can be used for later re-use.

solution

a 'matrix' representation of the best solution found. Each row represents a unique solution giving the best fitness at the final generation. More than one row indicates a non-unique solution. The number of columns is equal to the subset size k.

ga_output

output from the ISLPGAs, see gaisl-class for format description.

ga_time

time required to run the ISLPGAs, see system.time for details.

Author(s)

J.C. Fisher, U.S. Geological Survey, Idaho Water Science Center

References

Cohoon, J.P., Hegde, S.U., Martin, W.N., and Richards, D., 1987, Punctuated Equilibria: A Parallel Genetic Algorithm, in Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms, Grefenstette, J.J., Lawrence Earlbaum Associates, p. 155-161.

Luke, Sean, 2015, Essentials of metaheuristics (2nd ed.): Lulu, 263 p., available for free at https://cs.gmu.edu/~sean/book/metaheuristics/.

Scrucca, Luca, 2013, GA: A Package for Genetic Algorithms in R: Journal of Statistical Software, v. 53, no. 4, p. 1-37, doi: 10.18637/jss.v053.i04.

Scrucca, Luca, 2017, On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution: The R Journal, v. 9, no. 1, p. 187-206, https://journal.r-project.org/archive/2017/RJ-2017-008/.

Examples

# Problem: Choose the 4 smallest numbers from a list
#          of 100 values generated from a standard
#          uniform distribution.
k <- 4
n <- 100
seed <- 123
set.seed(seed); numbers <- sort.int(runif(n))
Fitness <- function(string, n, numbers) {
  idxs <- DecodeChromosome(string, n)
  -1 * sum(numbers[idxs])
}
## Not run: 
out <- FindOptimalSubset(n, k, Fitness, numbers, run = 10,
                         monitor = GA::gaislMonitor, seed = seed)
plot(out[["ga_output"]])
summary(out[["ga_output"]])
print(out[["solution"]])
print(out[["ga_output"]]@fitnessValue)

## End(Not run)


USGS-R/inlmisc documentation built on Sept. 17, 2022, 2:38 a.m.