getTrainData_: Generate supervised training data for classification

View source: R/SOptim_GetTrainData.R

getTrainData_R Documentation

Generate supervised training data for classification

Description

An ancillary function used to generate training data for classification. This function is the workhorse behind getTrainData.

Usage

getTrainData_(
  x,
  rstSegm,
  useThresh = TRUE,
  thresh = 0.5,
  na.rm = TRUE,
  dup.rm = TRUE,
  minImgSegm = 30,
  ignore = FALSE,
  tiles = NULL
)

Arguments

x

Input train data used for classification. The input can be a SpatRaster or a character string with a path to the raster layer.

rstSegm

A path or a SpatRaster object containing the outputs of a segmentation algorithm with each object/segment identified by an integer index.

useThresh

Use threshold to filter training data for multi-class? (default: TRUE; not used for points).

thresh

A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if x is a SpatRaster which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than thresh then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then thresh is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in thresh this case is kept in train data otherwise it will be filtered out. See also useThresh.

na.rm

Remove NA's? (default: TRUE). Only used if x is a SpatialPointsDataFrame object.

dup.rm

Remove duplicate values? (default: TRUE; see details). Only used if x is a SpatialPointsDataFrame object.

minImgSegm

Minimum number of image segments/objects necessary to generate train data.

ignore

If set to TRUE then train data may contain one single class. This is useful in cases where sample units contain only positive or negative train cases. Also applies if the threshold value is employed. In this case if no positive cases are generated then negatives will be returned (default: FALSE).

tiles

An object of class SOptim.Tiles created by createRasterTiles used to read data fractionally by tiles (default: NULL, i.e. not used for).

Value

A two-column data frame with segment IDs (column "SID") and the corresponding train class (column "train").

Note

Train raster data must contain at least two categories, coded as integers:

  • - 0 and 1 for "single-class" (with 1's being the class of interest and typically the minority class);

  • - or 1, 2, ..., n for "multi-class".

Background or null pixels should be coded as NoData. To produce valid train data, the image segmentation produce must generate more than minImgSegm unique segments (or objects).


joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.