getTrainData: Generate supervised training data for classification

View source: R/SOptim_GetTrainData.R

getTrainDataR Documentation

Generate supervised training data for classification

Description

An ancillary function used to generate training data for classification

Usage

getTrainData(
  x,
  rstSegm,
  useThresh = TRUE,
  thresh = 0.5,
  na.rm = TRUE,
  dup.rm = TRUE,
  minImgSegm = 30,
  ignore = FALSE,
  tiles = NULL
)

## Default S3 method:
getTrainData(
  x,
  rstSegm,
  useThresh = TRUE,
  thresh = 0.5,
  na.rm = TRUE,
  dup.rm = TRUE,
  minImgSegm = 30,
  ignore = FALSE,
  tiles = NULL
)

## S3 method for class 'SpatRaster'
getTrainData(
  x,
  rstSegm,
  useThresh = TRUE,
  thresh = 0.5,
  na.rm = TRUE,
  dup.rm = TRUE,
  minImgSegm = 30,
  ignore = FALSE,
  tiles = NULL
)

## S3 method for class 'character'
getTrainData(
  x,
  rstSegm,
  useThresh = TRUE,
  thresh = 0.5,
  na.rm = TRUE,
  dup.rm = TRUE,
  minImgSegm = 30,
  ignore = FALSE,
  tiles = NULL
)

Arguments

x

Input train data used for classification. The input can be a SpatRaster or a string with a path to the raster file.

rstSegm

A path or a SpatRaster object containing the outputs of a segmentation algorithm with each object/segment identified by an integer index.

useThresh

Use threshold to filter training data for multi-class? (default: TRUE; not used for points).

thresh

A threshold value defining the minimum proportion of the segment ]0, 1] that must be covered by a certain class to be considered as a training case. This threshold will only apply if x is a SpatRaster which means you are using train areas/pixels. If you are running a "single-class" problem then this threshold only applies to the class of interest (coded as 1's). Considering this, if a given segment has a proportion cover of that class higher than thresh then it is considered a train case. In contrast, for the background class (coded as 0's), only segments/objects totaly covered by that class are considered as train cases. If you are running a "multi-class" problem then thresh is applied differently. First, the train class is determined by a majority rule then if that class covers more than the value specified in thresh this case is kept in train data otherwise it will be filtered out. See also useThresh.

na.rm

Remove NA's? (default: TRUE). Only used if x is a SpatialPointsDataFrame object.

dup.rm

Remove duplicate values? (default: TRUE; see details). Only used if x is a SpatialPointsDataFrame object.

minImgSegm

Minimum number of image segments/objects necessary to generate train data.

ignore

If set to TRUE then train data may contain one single class. This is useful in cases where sample units contain only positive or negative train cases. Also applies if the threshold value is employed. In this case if no positive cases are generated then negatives will be returned (default: FALSE).

tiles

An object of class SOptim.Tiles created by createRasterTiles used to read data fractionally by tiles (default: NULL, i.e. not used for).

Details

In some cases, depending on the type of input training data or the output of the segmentation, duplicate segment IDs (SID) may occur for different class(es) meaning that a given segment may have more than one training class. In those cases dup.rm should be set to TRUE (default).

Raster data (in x and rstSegm) will be coerced to integer values before performing cross-tabulations to evaluate the percent coverage.

Value

A two-column data frame with segment IDs (column "SID") and the corresponding train class (column "train").

Note

Train raster data must contain at least two categories, coded as integers:

  • - 0 and 1 for "single-class" (with 1's being the feature of interest);

  • - or 1,2,...,n for "multi-class".

To produce valid train data, the image segmentation produce must generate at least 30 unique segments (or objects)

Examples


library(terra)

rstSegm <- rast(nrows=100, ncols=100, xmin=0, xmax=100,ymin=0,ymax=100,res=1)
km <- kmeans(xyFromCell(rstSegm, 1:ncell(rstSegm)),100,iter.max = 100)
values(rstSegm) <- km$cluster

rstTrain <- rast(nrows=100, ncols=100, xmin=0, xmax=100,ymin=0,ymax=100,res=1)
km <- kmeans(xyFromCell(rstTrain, 1:ncell(rstTrain)),5,iter.max = 100)
values(rstTrain) <- km$cluster

getTrainData(rstTrain, rstSegm)



joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.