GA.prep: Create R object with genetic algorithm optimization settings

View source: R/GA_prep.R

GA.prepR Documentation

Create R object with genetic algorithm optimization settings

Description

This function prepares and compiles objects and commands for optimization with the GA package

Usage

GA.prep(ASCII.dir,
               Results.dir = NULL,
               min.cat = NULL,
               max.cat = 1000,
               max.cont = 1000,
               min.scale = NULL,
               max.scale = NULL,
               shape.min = NULL,
               shape.max = NULL,
               cont.shape = NULL,
               select.trans = NULL,
               cat.levels = 15,
               method = "LL",
               scale = FALSE,
               scale.surfaces = NULL,
               k.value = 2,
               pop.mult = 15,
               percent.elite = 5,
               type = "real-valued",
               pcrossover = 0.85,
               pmutation = 0.125,
               maxiter = 1000,
               run = NULL,
               keepBest = TRUE,
               population = gaControl(type)$population,
               selection = gaControl(type)$selection,
               crossover = "gareal_blxCrossover",
               mutation = gaControl(type)$mutation,
               pop.size = NULL,
               parallel = FALSE,
               gaisl = FALSE,
               island.pop = 20,
               numIslands = NULL,
               migrationRate = NULL,
               migrationInterval = NULL,
               optim = FALSE,
               optim.method = "L-BFGS-B", 
               poptim = 0.0,
               pressel = 1.00,
               control = list(fnscale = -1, maxit = 100),
               hessian = FALSE,
               opt.digits = NULL,
               seed = NULL,
               monitor = TRUE,
               quiet = FALSE)

Arguments

ASCII.dir

Directory containing all raster objects to optimized. If optimizing using least cost paths, a RasterStack or RasterLayer object can be specified.

Results.dir

If a RasterStack is provided in place of a directory containing .asc files for ASCII.dir, then a directory to export optimization results must be specified. It is critical that there are NO SPACES in the directory, as this will cause the function to fail. If using the all_comb function, specify Results.dir as "all_comb".

min.cat

The minimum value to be assessed during optimization of categorical resistance surfaces (Default = 1 / max.cat)

max.cat

The maximum value to be assessed during optimization of categorical resistance surfaces (Default = 1000)

max.cont

The maximum value to be assessed during optimization of continuous resistance surfaces (Default = 1000)

min.scale

The minimum scaling parameter value to be assessed during optimization of resistance surfaces with kernel smoothing (Default = 0.01). See details

max.scale

The maximum scaling parameter value to be assessed during optimization of resistance surfaces with kernel smoothing (Default = 0.1 * maximum dimension of the raster surface)

shape.min

The minimum value for the shape parameter used for transforming resistance surfaces. If unspecified, used 0.5

shape.max

The maximum value for the shape parameter used for transforming resistance surfaces. If unspecified, used 14.5

cont.shape

A vector of hypothesized relationships that each continuous resistance surface will have in relation to the genetic distance response (Default = NULL; see details)

select.trans

Option to specify which transformations are applied to continuous surfaces. Must be provided as a list. "A" = All, "M" = Monomolecular only, "R" = Ricker only. Default = "M"; see Details.

cat.levels

Number of unique levels to permit in categorical surface (Default = 15). See Details

method

Objective function to be optimized. Select "AIC", "R2", or "LL" to optimize resistance surfaces based on AIC, variance explained (R2), or log-likelihood. (Default = "LL")

scale

Logical. To optimize a kernel smoothing scaling parameter during optimization, set to TRUE (Default = FALSE). See Details below.

scale.surfaces

(Optional) If doing multisurface optimization with kernel smoothing, indicate which surfaces should be smoothed. A vector equal in length to the number of resistance surfaces to be optimized using MS_optim.scale that is used to indicate whether a surface should (1) or should not (0) have kernel smoothing applied. See details.

k.value

Specification of how k, the number of parameters in the mixed effects model, is determined. Specify 1, 2, 3, or 4 (Default = 2; see details).

1 –> k = 2;

2 –> k = number of parameters optimized plus the intercept;

3 –> k = the number of parameters optimized plus the intercept and the number of layers optimized;

4 –> k = the number of layers optimized plus the intercept

pop.mult

Value will be multiplied with number of parameters in surface to determine 'popSize' in GA. By default this is set to 15.

percent.elite

An integer percent used to determine the number of best fitness individuals to survive at each generation ('elitism' in GA). By default the top 5% individuals will survive at each iteration.

type

Default is "real-valued"

pcrossover

Probability of crossover. Default = 0.85

pmutation

Probability of mutation. Default = 0.125

maxiter

Maximum number of iterations to run before the GA search is halted. If using standard ga optimizer, the default = 1000. If using gaisl = TRUE, then this is set to 15x the migrationInterval

run

Number of consecutive generations or epochs without any improvement in objective function before the GA is stopped. If using standard ga, the default = 25. If using gaisl = TRUE, then the default run value will be calculated as migrationInterval * 5

keepBest

A logical argument specifying if best solutions at each iteration should be saved (Default = TRUE)

population

Default is "gareal_Population" from GA

selection

Default is "gareal_lsSelection" from GA

crossover

Default = "gareal_blxCrossover". This crossover method greatly improved optimization during preliminary testing

mutation

Default is "gareal_raMutation" from GA

pop.size

Number of individuals to create each generation. If gaisl = TRUE, then this number is automatically calculated as numIslands * island.pop

parallel

A logical argument specifying if parallel computing should be used (TRUE) or not (FALSE, default) for evaluating the fitness function. You can also specify the number of cores to use.

gaisl

Should the genetic algorithm use the islands parallel optimization? (Default = FALSE)

island.pop

The number of individuals to populate each island. (Default = 20)

numIslands

If gaisl = TRUE, an integer value which specifies the number of islands to use in the genetic evolution (by default will be set to 4)

migrationRate

If gaisl = TRUE, a value in the range (0, 1) which gives the proportion of individuals that undergo migration between islands in every exchange (by default equal to 0.10).

migrationInterval

If gaisl = TRUE, an integer value specifying the number of iterations at which exchange of individuals takes place. This interval between migrations is called an epoch, and it is set at 10 by default.

optim

A logical defaulting to FALSE determining whether or not a local search using general-purpose optimisation algorithms should be used. See argument optimArgs for further details and finer control. Setting to TRUE has the potential to improve optimization accuracy, but will increase optimization time.

optim.method

The method to be used among those available in optim function. By default, the BFGS algorithm with box constraints is used, where the bounds are those provided in the ga() function call. Further methods are available as described in the Details section in help(optim).

poptim

A value in the range [0,1] specifying the probability of performing a local search at each iteration of GA (default 0.0). Only change if your optimization is relatively fast.

pressel

A value in the range [0,1] specifying the pressure selection (default 1.00). The local search is started from a random solution selected with probability proportional to fitness. High values of pressel tend to select the solutions with the largest fitness, whereas low values of pressel assign quasi-uniform probabilities to any solution.

control

A list of control parameters. See 'Details' section in optim

hessian

Logical. Should a numerically differentiated Hessian matrix be returned? This will allow for the calculation of standard errors on parameter estimates (not yet implemented). Default = FALSE

opt.digits

The number of significant digits that the objective function will be assessed at. By default, no rounding occurs.

seed

Integer random number seed to replicate ga optimization

monitor

Default = TRUE, which prints the average and best fitness values at each iteration.

quiet

Logical. If TRUE, the objective function and step run time will not be printed to the screen after each step. Only ga summary information will be printed following each iteration. (Default = FALSE)

Details

Only files that you wish to optimize, either in isolation or simultaneously, should be included in the specified ASCII.dir. If you wish to optimize different combinations of surfaces, different directories containing these surfaces must be created. It is preferable to provide a RasterStack.

When scale = TRUE, the standard deviation of the Gaussian kernel smoothing function (sigma) will also be optimized during optimization. Only continuous surfaces or binary categorical surfaces (e.g., forest/no forest; 1/0) surfaces can be optimized when scale = TRUE

scale.surfaces can be used to specify which surfaces to apply kernel smoothing to during multisurface optimization. For example, scale.surfaces = c(1, 0, 1) will result in the first and third surfaces being optimized with a kernel smoothing function, while the second surface will not be scaled. The order of surfaces will match either the order of the raster stack, or alphabetical order when reading in from a directory.

min.scale defaults to a minimum of 0.01. During optimization, whenever the scaling factor (sigma) is less than 0.5, ResistanceGA will not apply scaling. In this way, it is possible for a surface to not be scaled.

The Default for k.value is 2, which sets k equal to the number of parameters optimized, plus 1 for the intercept term. Prior to version 3.0-0, k.value could not be specified by the user and followed setting 2, such that k was equal to the number of parameters optimized plus the intercept term.

cont.shape can take values of "Increase", "Decrease", or "Peaked". If you believe a resistance surface is related to your response in a particular way, specifying this here may decrease the time to optimization. cont.shape is used to generate an initial set of parameter values to test during optimization. If specified, a greater proportion of the starting values will include your believed relationship. If unspecified (the Default), a completely random set of starting values will be generated.

If it is desired that only certain transformations be assessed for continuous surfaces, then this can be specified using select.trans. By default, only monomolecular transformations will be assessed for continuous surfaces unless otherwise specified. Specific transformations can be specified by providing a vector of values (e.g., c(1,3,5)), with values corresponding to the equation numbers as detailed in Resistance.tran. If multiple rasters are to be optimized from the same directory, then a list of transformations must be provided in the order that the raster surfaces will be assessed. For example:
select.trans = list("M", "A", "R", c(5,6))
will result in surface one only being optimized with Monomolecular transformations, surface two with all transformations, surface three with only Ricker transformations, and surface four with Reverse Ricker and Reverse Monomolecular only. If a categorical surface is among the rasters to be optimized, it is necessary to specify NA to accommodate this.

cat.levels defaults to 15. This means that when a raster surface has <= 15 unique levels, it will be treated as a categorical surface in the analysis. This value can be increased, but optimization of surfaces with many levels may take more time. Additionally, depending upon the prevalence and configuration of categorical features and spatial sample locations, some levels are likely to be poorly estimated. This may be evident if estimated resistance values vary substantially between runs of ResistanceGA.

Setting gaisl = TRUE has the potential greatly reduce the optimization run time, potentially with greater accuracy. This is a distributed multiple-population GA, where the population is partitioned into several subpopulations and assigned to separated islands. Independent GAs are executed in each island, and only occasionally sparse exchanges of individuals are performed among the islands.

It is recommended to first run GA optimization with the default settings

Value

An R object that is a required input into optimization functions

Author(s)

Bill Peterman <Peterman.73@osu.edu>

Examples

 
## Not run:
## *** TO BE COMPLETED *** ##

## End (Not run)

wpeterman/ResistanceGA documentation built on Nov. 20, 2023, 11:50 p.m.