searchOptimSegmentationParams: Perform grid or random search optimization of...

View source: R/SOptim_GridSearch.R

searchOptimSegmentationParamsR Documentation

Perform grid or random search optimization of segmentation/OBIA parameters

Description

This function performs the optimization of segmentation parameters using a simple grid or random search algorithm. It also verifies if input data and parameters are parsable.

Usage

searchOptimSegmentationParams(
  rstFeatures,
  trainData,
  segmentMethod,
  ...,
  optimMethod = "random",
  segmParamList,
  grid.searchSize = 5,
  rand.numIter = 250,
  rand.nneigh = 5,
  rand.initNeighs = (5 * rand.nneigh),
  rand.neighSizeProp = 0.025,
  rand.iter = 25,
  trainThresh = 0.5,
  segmStatsFuns = c("mean", "sd"),
  classificationMethod = "RF",
  classificationMethodParams = NULL,
  balanceTrainData = FALSE,
  balanceMethod = "ubUnder",
  evalMethod = "5FCV",
  trainPerc = 0.8,
  nRounds = 20,
  evalMetric = "Kappa",
  minTrainCases = 30,
  minCasesByClassTrain = 10,
  minCasesByClassTest = 10,
  minImgSegm = 30,
  verbose = TRUE,
  parallel = FALSE,
  seed = NULL
)

Arguments

rstFeatures

Features used for supervised classification (typically a multi-layer SpatRaster with one feature per band). May be defined as a string with the path to a raster dataset or a RasterStack object.

trainData

Input train data used for supervised classification. It must be a SpatRaster containing train areas (in raster format)

segmentMethod

Character string used to define the segmentation method. Available options are:

  • "SAGA_SRG" - SAGA Simple Region Growing;

  • "GRASS_RG" - GRASS Region Growing;

  • "ArcGIS_MShift" - ArcGIS Mean Shift algorithm;

  • "Terralib_Baatz" - TerraLib Baatz algorithm;

  • "Terralib_MRGrow" - TerraLib Mean Region Growing;

  • "RSGISLib_Shep" - RSGISLib Shepherd algorithm;

  • "OTB_LSMS" - OTB Large Scale Mean Shift algorithm;

  • "OTB_LSMS2" - OTB Large Scale Mean Shift algorithm with two separate sets of parameters, one for mean-shift smoothing and another for large-scale segmentation step;

...

Additional parameters passed to the segmentation functions that will not be optimized (see also: segmentationGeneric). It must also contain the input segmentation data (typically a multi-layer SpatRaster dataset with one input feature per band) depending one the algorithm selected.

optimMethod

A string defining which type of optimizing to use. Options are: "random" (default) for randomized search, or, "grid" for exhaustive grid search.

segmParamList

A named list object containing the parameters that will be optimized for the selected segmentation method. Check parameter names with function segmentationParamNames. The list should contain two values with parameter ranges (min, max). For example, considering method "GRASS_SRG", the list could be: list(Threshold = c(0.1, 0.5), MinSize = c(10, 50)).

grid.searchSize

This value will be used to extend parameter ranges. For example, if gridSearchSize = 5 and a given parameter range is set to p=[0,1] this will generate the following regular sequence: p = c(0.00, 0.25, 0.50, 0.75, 1.00). All combinations from extended parameters will then be used for grid search method.(default: 5)

rand.numIter

Number of iteration for random search optimization (default: 250).

rand.nneigh

Number of neighbors (or parameter combinations) to generate in the vicinity of the best (default: 5).

rand.initNeighs

Number of parameter combinations to randomly draw from paramList at initialization (default: 5 * nneigh)

rand.neighSizeProp

Size of the neighbourhood for a given parameter, i.e., a real value contained in ]0, 1] used to multiply the range size as: neighSize = (max_{range} - min_{range}) \times neighSizeProp(default: 0.025)

rand.iter

Number of sucessive iterations used to stop the algorithm if no improvement is found (default: 25).

trainThresh

A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if x is a RasterLayer which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than thresh then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then thresh is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in thresh this case is kept in train data otherwise it will be filtered out. See also useThresh.

segmStatsFuns

An aggregation function (e.g., mean) applied to the elements within each segment. Either a function object or a function name.

classificationMethod

An input string defining the classification algorithm to be used. Available options are: "RF" (random forests), "GBM" (generalized boosted models), "SVM" (support vector machines), "KNN" (k-nearest neighbour), and, "FDA" (flexible discriminant analysis).

classificationMethodParams

A list object with a customized set of parameters to be used for the classification algorithms (default = NULL). See also generateDefaultClassifierParams to see which parameters can be changed and how to structure the list object.

balanceTrainData

Defines if data balancing is to be used (only available for single-class problems; default: TRUE).

balanceMethod

A character string used to set the data balancing method. Available methods are based on under-sampling "ubUnder" or over-sampling "ubOver" the target class.

evalMethod

A character string defining the evaluation method. The available methods are "10FCV" (10-fold cross-validation; the default), "5FCV" (5-fold cross-validation), "HOCV" (holdout cross-validation with the training percentage defined by trainPerc and the number of rounds defined in nRounds), and, "OOB" (out-of-bag evaluation; only applicable to random forests).

trainPerc

A decimal number defining the training proportion (default: 0.8; if "HOCV" is used).

nRounds

Number of training rounds used for holdout cross-validation (default: 20; if "HOCV" is used).

evalMetric

A character string setting the evaluation metric or a function that calculates the performance score based on two vectors one for observed and the other for predicted values (see below for more details). This option defines the outcome value of the genetic algorithm fitness function and the output of grid or random search optimization routines. Check evalPerformanceGeneric for available options. When runFullCalibration=TRUE this metric will be calculated however other evaluation metrics can be quantified using evalPerformanceClassifier.

minTrainCases

The minimum number of training cases used for calibration (default: 20). If the number of rows in x is below this number then calibrateClassifier will not run.

minCasesByClassTrain

Minimum number of cases by class for each train data split so that the classifier is able to run.

minCasesByClassTest

Minimum number of cases by class for each test data split so that the classifier is able to run.

minImgSegm

Minimum number of image segments/objects necessary to generate train data.

verbose

Print output messages? (default: TRUE).

parallel

A logical argument specifying if parallel computing should be used (TRUE) or not (FALSE, default) for evaluating the fitness function. This argument could also be used to specify the number of cores to employ; by default, this is taken from detectCores. Finally, the functionality of parallelization depends on system OS: on Windows only 'snow' type functionality is available, while on Unix/Linux/Mac OSX both 'snow' and 'multicore' (default) functionalities are available.

seed

An integer value containing the random number generator state. This argument can be used to replicate the results of a grid search. Note that if parallel computing is required, the doRNG package must be installed

Value

A data frame containing the segmentation parameters tested and the respective value of the evaluation metric (by default the table is ordered in decreasing order using this column).

See Also

Check the segmentation parameters and data used by each algorithm that must be defined in ... and segmParamList:

  • segmentationGeneric,

  • SAGA Seeded Region Growing: segmentation_SAGA_SRG,

  • GRASS Region Growing: segmentation_GRASS_RG,

  • ArcGIS Mean Shift: segmentation_ArcGIS_MShift,

  • TerraLib Baatz-Schaphe: segmentation_Terralib_Baatz,

  • Terralib Mean Region Growing: segmentation_Terralib_MRGrow,

  • RSGISLib Shepherd: segmentation_RSGISLib_Shep,

  • RSGISLib OTB Large Scale Mean Shift: segmentation_OTB_LSMS

segmentationParamNames can be used to output a short list of parameter names to include in segmParamList for optimization.


joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.