symbolicRegression: Symbolic regression via untyped standard genetic programming

Description Usage Arguments Value See Also

View source: R/symbolic_regression.r

Description

Perform symbolic regression via untyped genetic programming. The regression task is specified as a formula. Only simple formulas without interactions are supported. The result of the symbolic regression run is a symbolic regression model containing an untyped GP population of model functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
symbolicRegression(formula, data, stopCondition = makeTimeStopCondition(5),
  population = NULL, populationSize = 100, eliteSize = ceiling(0.1 *
  populationSize), elite = list(), extinctionPrevention = FALSE,
  archive = FALSE, individualSizeLimit = 64,
  penalizeGenotypeConstantIndividuals = FALSE, subSamplingShare = 1,
  functionSet = mathFunctionSet, constantSet = numericConstantSet,
  crossoverFunction = NULL, mutationFunction = NULL,
  restartCondition = makeEmptyRestartCondition(),
  restartStrategy = makeLocalRestartStrategy(),
  searchHeuristic = makeAgeFitnessComplexityParetoGpSearchHeuristic(),
  breedingFitness = function(individual) TRUE, breedingTries = 50,
  errorMeasure = rmse, progressMonitor = NULL, envir = parent.frame(),
  verbose = TRUE)

Arguments

formula

A formula describing the regression task. Only simple formulas of the form response ~ variable1 + ... + variableN are supported at this point in time.

data

A data.frame containing training data for the symbolic regression run. The variables in formula must match column names in this data frame.

stopCondition

The stop condition for the evolution main loop. See makeStepsStopCondition for details.

population

The GP population to start the run with. If this parameter is missing, a new GP population of size populationSize is created through random growth.

populationSize

The number of individuals if a population is to be created.

eliteSize

The number of elite individuals to keep. Defaults to ceiling(0.1 * populationSize).

elite

The elite list, must be alist of individuals sorted in ascending order by their first fitness component.

extinctionPrevention

When set to TRUE, the initialization and selection steps will try to prevent duplicate individuals from occurring in the population. Defaults to FALSE, as this operation might be expensive with larger population sizes.

archive

If set to TRUE, all GP individuals evaluated are stored in an archive list archiveList that is returned as part of the result of this function.

individualSizeLimit

Individuals with a number of tree nodes that exceeds this size limit will get a fitness of Inf.

penalizeGenotypeConstantIndividuals

Individuals that do not contain any input variables will get a fitness of Inf.

subSamplingShare

The share of fitness cases

s

sampled for evaluation with each function evaluation.

0 < s ≤q 1

must hold, defaults to 1.0.

functionSet

The function set.

constantSet

The set of constant factory functions.

crossoverFunction

The crossover function.

mutationFunction

The mutation function.

restartCondition

The restart condition for the evolution main loop. See makeEmptyRestartCondition for details.

restartStrategy

The strategy for doing restarts. See makeLocalRestartStrategy for details.

searchHeuristic

The search-heuristic (i.e. optimization algorithm) to use in the search of solutions. See the documentation for searchHeuristics for available algorithms.

breedingFitness

A "breeding" function. This function is applied after every stochastic operation Op that creates or modifies an individal (typically, Op is a initialization, mutation, or crossover operation). If the breeding function returns TRUE on the given individual, Op is considered a success. If the breeding function returns FALSE, Op is retried a maximum of breedingTries times. If this maximum number of retries is exceeded, the result of the last try is considered as the result of Op. In the case the breeding function returns a numeric value, the breeding is repeated breedingTries times and the individual with the lowest breeding fitness is considered the result of Op.

breedingTries

In case of a boolean breedingFitness function, the maximum number of retries. In case of a numerical breedingFitness function, the number of breeding steps. Also see the documentation for the breedingFitness parameter. Defaults to 50.

errorMeasure

A function to use as an error measure, defaults to RMSE.

progressMonitor

A function of signature function(population, fitnessValues, fitnessFunction, stepNumber, evaluationNumber, bestFitness, timeElapsed) to be called with each evolution step.

envir

The R environment to evaluate individuals in, defaults to parent.frame().

verbose

Whether to print progress messages.

Value

An symbolic regression model that contains an untyped GP population.

See Also

predict.symbolicRegressionModel, geneticProgramming


rgp documentation built on May 30, 2017, 12:45 a.m.