gaOptimizeSegmentationParams: Optimization of segmentation parameters using genetic...
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

View source: R/SOptim_OptimizationFunctions.R

gaOptimizeSegmentationParams

R Documentation

Optimization of segmentation parameters using genetic algorithms

Description

This function makes some data checks and then performs the optimization of segmentation parameters using genetic algorithms.

Usage

gaOptimizeSegmentationParams(
  rstFeatures,
  trainData,
  segmentMethod,
  trainThresh = 0.5,
  segmStatsFuns = c("mean", "sd"),
  bylayer = FALSE,
  tiles = NULL,
  classificationMethod = "RF",
  classificationMethodParams = NULL,
  balanceTrainData = TRUE,
  balanceMethod = "ubUnder",
  evalMethod = "5FCV",
  trainPerc = 0.8,
  nRounds = 10,
  evalMetric = "Kappa",
  minTrainCases = 30,
  minCasesByClassTrain = 10,
  minCasesByClassTest = 10,
  minImgSegm = 30,
  ndigits = 2,
  verbose = TRUE,
  ...,
  lower,
  upper,
  population = GA::gaControl("real-valued")$population,
  selection = GA::gaControl("real-valued")$selection,
  crossover = GA::gaControl("real-valued")$crossover,
  mutation = GA::gaControl("real-valued")$mutation,
  popSize = 20,
  pcrossover = 0.8,
  pmutation = 0.1,
  elitism = base::max(1, round(popSize * 0.05)),
  maxiter = 100,
  run = 20,
  maxFitness = 1,
  keepBest = TRUE,
  parallel = FALSE,
  seed = NULL
)

Arguments

`rstFeatures`	Features used for supervised classification (typically a multi-layer SpatRaster with one feature per band). May be defined as a string with the path to a raster dataset or a `RasterStack` object.
`trainData`	Input train data used for supervised classification. It must be a `SpatRaster` containing train areas (in raster format)
`segmentMethod`	Character string used to define the segmentation method. Available options are: `"SAGA_SRG"` - SAGA Simple Region Growing; `"GRASS_RG"` - GRASS Region Growing; `"ArcGIS_MShift"` - ArcGIS Mean Shift algorithm; `"Terralib_Baatz"` - TerraLib Baatz algorithm; `"Terralib_MRGrow"` - TerraLib Mean Region Growing; `"RSGISLib_Shep"` - RSGISLib Shepherd algorithm; `"OTB_LSMS"` - OTB Large Scale Mean Shift algorithm; `"OTB_LSMS2"` - OTB Large Scale Mean Shift algorithm with two separate sets of parameters, one for mean-shift smoothing and another for large-scale segmentation step;
`trainThresh`	A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if `x` is a `RasterLayer` which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than `thresh` then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then `thresh` is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in `thresh` this case is kept in train data otherwise it will be filtered out. See also `useThresh`.
`segmStatsFuns`	An aggregation function (e.g., `mean`) applied to the elements within each segment. Either a function object or a function name.
`bylayer`	Calculate statistics layer by layer instead of all at once? (slightly increases computation time but spares memory load; default: FALSE).
`tiles`	Number of times to slice the SpatRaster across row and column direction. The total number of tiles will be given by: `N_{tiles} = nd^{2}`.
`classificationMethod`	An input string defining the classification algorithm to be used. Available options are: `"RF"` (random forests), `"GBM"` (generalized boosted models), `"SVM"` (support vector machines), `"KNN"` (k-nearest neighbour), and, `"FDA"` (flexible discriminant analysis).
`classificationMethodParams`	A list object with a customized set of parameters to be used for the classification algorithms (default = NULL). See also generateDefaultClassifierParams to see which parameters can be changed and how to structure the list object.
`balanceTrainData`	Defines if data balancing is to be used (only available for single-class problems; default: TRUE).
`balanceMethod`	A character string used to set the data balancing method. Available methods are based on under-sampling `"ubUnder"` or over-sampling `"ubOver"` the target class.
`evalMethod`	A character string defining the evaluation method. The available methods are `"10FCV"` (10-fold cross-validation; the default), `"5FCV"` (5-fold cross-validation), `"HOCV"` (holdout cross-validation with the training percentage defined by `trainPerc` and the number of rounds defined in `nRounds`), and, `"OOB"` (out-of-bag evaluation; only applicable to random forests).
`trainPerc`	A decimal number defining the training proportion (default: 0.8; if `"HOCV"` is used).
`nRounds`	Number of training rounds used for holdout cross-validation (default: 20; if `"HOCV"` is used).
`evalMetric`	A character string setting the evaluation metric or a function that calculates the performance score based on two vectors one for observed and the other for predicted values (see below for more details). This option defines the outcome value of the genetic algorithm fitness function and the output of grid or random search optimization routines. Check `evalPerformanceGeneric` for available options. When `runFullCalibration=TRUE` this metric will be calculated however other evaluation metrics can be quantified using evalPerformanceClassifier.
`minTrainCases`	The minimum number of training cases used for calibration (default: 20). If the number of rows in `x` is below this number then `calibrateClassifier` will not run.
`minCasesByClassTrain`	Minimum number of cases by class for each train data split so that the classifier is able to run.
`minCasesByClassTest`	Minimum number of cases by class for each test data split so that the classifier is able to run.
`minImgSegm`	Minimum number of image segments/objects necessary to generate train data.
`ndigits`	Number of decimal plates to consider for rounding the fitness function output. For example, if `ndigits=2` then only improvements of 0.01 will be considered by the GA algorithm.
`verbose`	Print output messages? (default: TRUE).
`...`	Additional parameters passed to the segmentation functions that will not be optimized (see also: `segmentationGeneric`). It must also contain the input segmentation data (typically a multi-layer SpatRaster dataset with one input feature per band) depending one the algorithm selected.
`lower`	a vector of length equal to the decision variables providing the lower bounds of the search space in case of real-valued or permutation encoded optimizations. Formerly this argument was named `min`; its usage is allowed but deprecated.
`upper`	a vector of length equal to the decision variables providing the upper bounds of the search space in case of real-valued or permutation encoded optimizations. Formerly this argument was named `max`; its usage is allowed but deprecated.
`population`	an R function for randomly generating an initial population. See `ga_Population` for available functions.
`selection`	an R function performing selection, i.e. a function which generates a new population of individuals from the current population probabilistically according to individual fitness. See `ga_Selection` for available functions.
`crossover`	an R function performing crossover, i.e. a function which forms offsprings by combining part of the genetic information from their parents. See `ga_Crossover` for available functions.
`mutation`	an R function performing mutation, i.e. a function which randomly alters the values of some genes in a parent chromosome. See `ga_Mutation` for available functions.
`popSize`	the population size.
`pcrossover`	the probability of crossover between pairs of chromosomes. Typically this is a large value and by default is set to 0.8.
`pmutation`	the probability of mutation in a parent chromosome. Usually mutation occurs with a small probability, and by default is set to 0.1.
`elitism`	the number of best fitness individuals to survive at each generation. By default the top 5% individuals will survive at each iteration.
`maxiter`	the maximum number of iterations to run before the GA search is halted.
`run`	the number of consecutive generations without any improvement in the best fitness value before the GA is stopped.
`maxFitness`	the upper bound on the fitness function after that the GA search is interrupted.
`keepBest`	a logical argument specifying if best solutions at each iteration should be saved in a slot called `bestSol`. See `ga-class`.
`parallel`	An optional argument which allows to specify if the Genetic Algorithm should be run sequentially or in parallel. For a single machine with multiple cores, possible values are: a logical value specifying if parallel computing should be used (`TRUE`) or not (`FALSE`, default) for evaluating the fitness function; a numerical value which gives the number of cores to employ. By default, this is obtained from the function `detectCores`; a character string specifying the type of parallelisation to use. This depends on system OS: on Windows OS only `"snow"` type functionality is available, while on Unix/Linux/Mac OSX both `"snow"` and `"multicore"` (default) functionalities are available. In all the cases described above, at the end of the search the cluster is automatically stopped by shutting down the workers. If a cluster of multiple machines is available, evaluation of the fitness function can be executed in parallel using all, or a subset of, the cores available to the machines belonging to the cluster. However, this option requires more work from the user, who needs to set up and register a parallel back end. In this case the cluster must be explicitly stopped with `stopCluster`.
`seed`	an integer value containing the random number generator state. This argument can be used to replicate the results of a GA search. Note that if parallel computing is required, the doRNG package must be installed.

Details

– INTRODUCTION –

Genetic algorithms (GAs) are stochastic search algorithms inspired by the basic principles of biological evolution and natural selection. GAs simulate the evolution of living organisms, where the fittest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation. The GA package is a collection of general purpose functions that provide a flexible set of tools for applying a wide range of genetic algorithm methods (from package GA). By default SegOptim uses genetic algorithm optimization for "real-valued" type, i.e., optimization problems where the decision variables (i.e., segmentation parameters) are floating-point representations of real numbers.

– INPUT DATA PREPARATION –

TODO: ...

– COMPUTING TIME AND COMPLEXITY –

Depending on the size of the raster dataset and the amount of segmentation features/layers used, take into consideration that running this function may take quite some time!! Therefore it is crucial to use only a relevant subset (or subsets) of your data to run this procedure. Also, choosing an appropriate parameterization of the genetic algorithm is key to decrease computing time. For example, if popSize is set to 30 and maxiter to 100, then a maximum number of 3000 image segmentation runs would be required to stop the optimization! (usually, running the segmentation is the most time-consuming task of the optimization procedure). However, if run is set to 20, this means that if that number of iterations records no improvement in fitness (i.e., the classification score) then the optimization stops and returns the best set of parameters. Bottom-line is that setting this parameters appropriately is fundamental to get good results in a admissible ammount of time. On another hand, classification algorithms are also working in the background of the fitness function. Using the previous example and admitting that we set evalMethod to "HOCV" and nRounds to 20 this means that classification would run a maximum of 3000 \times 20 = 60000 times!! So keep this in mind when setting nRounds value or select a more conservative cross-validation method such as 10- or 5-fold CV. The time required for training the classifier also depends on the input data thus a segmentation solution with larger number of objects will take longer. The number of classification features (or variables) also affects computation time, higher number of these will make classification algorithms running slower.

– SETTING PARAMETERS FOR OPTIMIZATION –

TODO: Controlling for 'biased' classification performance due to class inbalance

TODO: Setting appropriate ranges for segmentation algorithms

TODO: Setting appropriate genetic algorithm parametrization

Value

An object with GA optimization results. See ga-class for a description of available slots information.

Note

Do not use parallel option when performing image segmentation with OTB LSMS algorithm! Since the software uses a parallel implementation, this will probably freeze the system by consuming all CPU resources.

References

Scrucca L. (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1-37, http://www.jstatsoft.org/v53/i04/.

Scrucca L. (2016). On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. Submitted to R Journal. Pre-print available at: http://arxiv.org/abs/1605.01931.

joaofgoncalves/SegOptim
Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

gaOptimizeSegmentationParams: Optimization of segmentation parameters using genetic...
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

Optimization of segmentation parameters using genetic algorithms

Description

Usage

Arguments

Details

Value

Note

References

See Also

Related to gaOptimizeSegmentationParams in joaofgoncalves/SegOptim...

R Package Documentation

Browse R Packages

We want your feedback!

joaofgoncalves/SegOptim Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

gaOptimizeSegmentationParams: Optimization of segmentation parameters using genetic... In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

Optimization of segmentation parameters using genetic algorithms

Description

Usage

Arguments

Details

Value

Note

References

See Also

Related to gaOptimizeSegmentationParams in joaofgoncalves/SegOptim...

R Package Documentation

Browse R Packages

We want your feedback!

joaofgoncalves/SegOptim
Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

gaOptimizeSegmentationParams: Optimization of segmentation parameters using genetic...
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)