prepareCalData: Prepare calibration data for running a classification...

View source: R/SOptim_PrepFullDataset.R

prepareCalDataR Documentation

Prepare calibration data for running a classification algorithm

Description

An auxiliary wrapper function used to generate train/evaluation data and calculating feature statistics by image segment. The output object can then be used in calibrateClassifier function for training a classification algorithm (with option runFullCalibration=TRUE).

Usage

prepareCalData(
  rstSegm,
  trainData,
  rstFeatures,
  thresh = 0.5,
  funs = c("mean", "sd"),
  minImgSegm = 30,
  bylayer = FALSE,
  tiles = NULL,
  verbose = TRUE,
  progressBar = FALSE
)

Arguments

rstSegm

A path or a SpatRaster object containing the outputs of a segmentation algorithm with each object/segment identified by an integer index. Can also be the direct result of a segmentation function (e.g., segmentation_OTB_LSMS) as an object of class SOptim.SegmentationResult.

trainData

Input train data used for classification. The input can be a SpatRaster, an integer vector (containing raster data after using values function) or a character string with a path to the raster layer. If x is an integer vector then it should have a length equal to the number of pixels in rstSegm.

rstFeatures

A string defining the path to the raster features or a SpatRaster object.

thresh

A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if x is a SpatRaster which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than thresh then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then thresh is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in thresh this case is kept in train data otherwise it will be filtered out. See also useThresh.

funs

A character vector with the name(s) of the functions used to aggregate data (default: c("mean", "sd")).

minImgSegm

Minimum number of image segments/objects necessary to generate train data.

bylayer

Calculate statistics layer by layer instead of all at once? (slightly increases computation time but spares memory load; default: FALSE).

tiles

Number of times that the image will be divided along the x and y axes. This means that the original raster data will be split into a number of blocks equal to tiles^2 (e.g., if tiles = 5 this will generate 25 blocks/tiles) for reading. This number should be larger for large SpatRaster objects. This means that some fine tuning may be necessary to adjust this value according to available memory and raster input size.

verbose

Print progress messages? (default: TRUE)

progressBar

Boolean. Show progress bar? (default: FALSE).

Value

An object of class SOptim.CalData containing two elements:

  1. calData - A data frame object containing calibration data for training and evaluating a classifier algorithm. The first column (named "SID") contains the ID of each segment, and the second column (named "train") holds the segment class (or label). The following n columns hold the classification features for training;

  2. classifFeatData - A data frame containing all segments and features from inputs. The first column (named "SID") holds the unique identifier for each image segment. The following n columns are used as classification features. Typically this data set is used for predicting the target class after calibrating a certain classifier algorithm.


joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.