fitFuncGeneric: Fitness function used in genetic, random and grid search...

View source: R/SOptim_FitnessFunctions.R

fitFuncGenericR Documentation

Fitness function used in genetic, random and grid search optimization algorithms

Description

The fitness function takes a candidate solution to the problem as input (in this case the segmentation parameters) and produces an output value measuring how "fit" or how "good" the solution is with respect to the problem in hand (i.e., classification performance results).

Usage

fitFuncGeneric(
  x,
  rstFeatures,
  trainData,
  segmentMethod,
  ...,
  trainThresh = 0.5,
  segmStatsFuns = c("mean", "sd"),
  bylayer = FALSE,
  tiles = NULL,
  classificationMethod = "RF",
  classificationMethodParams = NULL,
  balanceTrainData = FALSE,
  balanceMethod = "ubUnder",
  evalMethod = "5FCV",
  trainPerc = 0.8,
  nRounds = 20,
  evalMetric = "Kappa",
  minTrainCases = 30,
  minCasesByClassTrain = 10,
  minCasesByClassTest = 10,
  minImgSegm = 30,
  ndigits = 2,
  verbose = TRUE
)

Arguments

x

Vector with segmentation parameters that will be optimized by the genetic algorithms from GA package.

rstFeatures

Features used for supervised classification (typically a multi-layer SpatRaster with one feature per band). May be defined as a string with the path to a raster dataset or a RasterStack object.

trainData

Input train data used for supervised classification. It must be a SpatRaster containing train areas (in raster format)

segmentMethod

Character string used to define the segmentation method. Available options are:

  • "SAGA_SRG" - SAGA Simple Region Growing;

  • "GRASS_RG" - GRASS Region Growing;

  • "ArcGIS_MShift" - ArcGIS Mean Shift algorithm;

  • "Terralib_Baatz" - TerraLib Baatz algorithm;

  • "Terralib_MRGrow" - TerraLib Mean Region Growing;

  • "RSGISLib_Shep" - RSGISLib Shepherd algorithm;

  • "OTB_LSMS" - OTB Large Scale Mean Shift algorithm;

  • "OTB_LSMS2" - OTB Large Scale Mean Shift algorithm with two separate sets of parameters, one for mean-shift smoothing and another for large-scale segmentation step;

...

Additional parameters passed to the segmentation functions that will not be optimized (see also: segmentationGeneric). It must also contain the input segmentation data (typically a multi-layer SpatRaster dataset with one input feature per band) depending one the algorithm selected.

trainThresh

A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if x is a RasterLayer which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than thresh then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then thresh is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in thresh this case is kept in train data otherwise it will be filtered out. See also useThresh.

segmStatsFuns

An aggregation function (e.g., mean) applied to the elements within each segment. Either a function object or a function name.

bylayer

Calculate statistics layer by layer instead of all at once? (slightly increases computation time but spares memory load; default: FALSE).

tiles

Number of times to slice the SpatRaster across row and column direction. The total number of tiles will be given by: N_{tiles} = nd^{2}.

classificationMethod

An input string defining the classification algorithm to be used. Available options are: "RF" (random forests), "GBM" (generalized boosted models), "SVM" (support vector machines), "KNN" (k-nearest neighbour), and, "FDA" (flexible discriminant analysis).

classificationMethodParams

A list object with a customized set of parameters to be used for the classification algorithms (default = NULL). See also generateDefaultClassifierParams to see which parameters can be changed and how to structure the list object.

balanceTrainData

Defines if data balancing is to be used (only available for single-class problems; default: TRUE).

balanceMethod

A character string used to set the data balancing method. Available methods are based on under-sampling "ubUnder" or over-sampling "ubOver" the target class.

evalMethod

A character string defining the evaluation method. The available methods are "10FCV" (10-fold cross-validation; the default), "5FCV" (5-fold cross-validation), "HOCV" (holdout cross-validation with the training percentage defined by trainPerc and the number of rounds defined in nRounds), and, "OOB" (out-of-bag evaluation; only applicable to random forests).

trainPerc

A decimal number defining the training proportion (default: 0.8; if "HOCV" is used).

nRounds

Number of training rounds used for holdout cross-validation (default: 20; if "HOCV" is used).

evalMetric

A character string setting the evaluation metric or a function that calculates the performance score based on two vectors one for observed and the other for predicted values (see below for more details). This option defines the outcome value of the genetic algorithm fitness function and the output of grid or random search optimization routines. Check evalPerformanceGeneric for available options. When runFullCalibration=TRUE this metric will be calculated however other evaluation metrics can be quantified using evalPerformanceClassifier.

minTrainCases

The minimum number of training cases used for calibration (default: 20). If the number of rows in x is below this number then calibrateClassifier will not run.

minCasesByClassTrain

Minimum number of cases by class for each train data split so that the classifier is able to run.

minCasesByClassTest

Minimum number of cases by class for each test data split so that the classifier is able to run.

minImgSegm

Minimum number of image segments/objects necessary to generate train data.

ndigits

Number of decimal plates to consider for rounding the fitness function output. For example, if ndigits=2 then only improvements of 0.01 will be considered by the GA algorithm.

verbose

Print output messages? (default: TRUE).

Details

"A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims" (from wikipedia). In particular the fitness function also acts as a 'wrapper' function linking together several other in the following worflow sequence:

  1. Run segmentation and load the results;

  2. Load train data for the segmentation generated;

  3. Extract feature data for the segments (segment statistics calculation);

  4. Merge calibration and feature data;

  5. Do train/test data partitions;

  6. Perform data balancing (if required by user and only for the train and single-class);

  7. Perform classification using the selected algorithm;

  8. Do performance evaluation for each subset;

  9. Return evaluation score (fitness value);

Value

The fitness function value (depends on the option set in evalMetric).

See Also

Check the segmentation parameters and data used by each algorithm that must be defined in ...:

  • segmentationGeneric,

  • SAGA Seeded Region Growing: segmentation_SAGA_SRG,

  • GRASS Region Growing: segmentation_GRASS_RG,

  • ArcGIS Mean Shift: segmentation_ArcGIS_MShift,

  • TerraLib Baatz-Schaphe: segmentation_Terralib_Baatz,

  • Terralib Mean Region Growing: segmentation_Terralib_MRGrow,

  • RSGISLib Shepherd: segmentation_RSGISLib_Shep,

  • OTB Large Scale Mean Shift: segmentation_OTB_LSMS,

  • OTB Large Scale Mean Shift with two sets of parameters: segmentation_OTB_LSMS2


joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.