calibrateClassifier: Train and evaluate a classification algorithm
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

View source: R/SOptim_ClassificationFunctions.R

calibrateClassifier

R Documentation

Train and evaluate a classification algorithm

Description

Main function used for classifier training and evaluation for both single and multi-class problems.

Usage

calibrateClassifier(
  calData,
  classificationMethod = "RF",
  classificationMethodParams = NULL,
  balanceTrainData = FALSE,
  balanceMethod = "ubOver",
  evalMethod = "HOCV",
  evalMetric = "Kappa",
  trainPerc = 0.8,
  nRounds = 20,
  minTrainCases = 30,
  minCasesByClassTrain = 10,
  minCasesByClassTest = 10,
  runFullCalibration = FALSE,
  verbose = TRUE
)

Arguments

`calData`	An input calibration dataset used for classification. It can be either an object of class `matrix`, `data.frame` or `SOptim.CalData` generated by `prepareCalData` (to use with option `runFullCalibration=TRUE`; see below for more details). If the input is a `matrix` or `data.frame` then it must contain one column named `SID` with segment IDs, one column named `"train"` defining the labels, classes or categories (either 0,1 for single-class problems, or, 1, 2, ..., n for multi-class problems) followed by n columns containing features (or variables) used for training the classifier.
`classificationMethod`	An input string defining the classification algorithm to be used. Available options are: `"RF"` (random forests), `"GBM"` (generalized boosted models), `"SVM"` (support vector machines), `"KNN"` (k-nearest neighbour), and, `"FDA"` (flexible discriminant analysis).
`classificationMethodParams`	A list object with a customized set of parameters to be used for the classification algorithms (default = NULL). See also generateDefaultClassifierParams to see which parameters can be changed and how to structure the list object.
`balanceTrainData`	Defines if data balancing is to be used (only available for single-class problems; default: TRUE).
`balanceMethod`	A character string used to set the data balancing method. Available methods are based on under-sampling `"ubUnder"` or over-sampling `"ubOver"` the target class.
`evalMethod`	A character string defining the evaluation method. The available methods are `"10FCV"` (10-fold cross-validation; the default), `"5FCV"` (5-fold cross-validation), `"HOCV"` (holdout cross-validation with the training percentage defined by `trainPerc` and the number of rounds defined in `nRounds`), and, `"OOB"` (out-of-bag evaluation; only applicable to random forests).
`evalMetric`	A character string setting the evaluation metric or a function that calculates the performance score based on two vectors one for observed and the other for predicted values (see below for more details). This option defines the outcome value of the genetic algorithm fitness function and the output of grid or random search optimization routines. Check `evalPerformanceGeneric` for available options. When `runFullCalibration=TRUE` this metric will be calculated however other evaluation metrics can be quantified using evalPerformanceClassifier.
`trainPerc`	A decimal number defining the training proportion (default: 0.8; if `"HOCV"` is used).
`nRounds`	Number of training rounds used for holdout cross-validation (default: 20; if `"HOCV"` is used).
`minTrainCases`	The minimum number of training cases used for calibration (default: 20). If the number of rows in `x` is below this number then `calibrateClassifier` will not run.
`minCasesByClassTrain`	Minimum number of cases by class for each train data split so that the classifier is able to run.
`minCasesByClassTest`	Minimum number of cases by class for each test data split so that the classifier is able to run.
`runFullCalibration`	Run full calibration? Check details section (default: FALSE).
`verbose`	Print progress messages? (default: TRUE)

Details

Two working modes can be used:

i) for "internal" GA optimization or grid/random search: runFullCalibration = FALSE, or,
ii) for performing a full segmented image classification: runFullCalibration = TRUE.

Tipically, the first option is used internally for optimizing segmentation parameters in gaOptimizeSegmentationParams where the output value from the selected evaluation metric is passed as the fitnes function outcome for GA optimization. The second option, should be used to perform a final image classification and to get full evaluation statistics (slot: 'PerfStats'), confusion matrices (slot: 'ConfMat'), train/test partion sets (slot: 'TrainSets'), classifier objects (slot: 'ClassObj') and parameters (slot: 'ClassParams'). In addition to the evaluation rounds (depending on the evaluation method selected) this option will also run a "full" round where all the data (i.e., no train/test split) will be used for training. Results from this option can then be used in predictSegments. This function can also perform data balancing for single-class problems (check out option balanceTrainData and balanceMethod).

Check ubBalance function for further details regarding data balancing.

For more details about the classification algorithms check out the following functions:

randomForest for random forest algorithm,
gbm for generalized boosted modelling,
svm for details related to support vector machines,
knn for k-nearest neighbour classification, and,
fda for flexible discriminant analysis.

Value

If runFullCalibration = FALSE then a single average value (across evaluation replicates/folds) for the selected evaluation metric will be returned (typically used for GA optimization). If runFullCalibration = TRUE then an object of class SOptim.Classifier is returned with the following elements:

AvgPerf - average value of the evaluation metric selected;
PerfStats - numeric vector with performance statistics (for the selected metric) for each evaluation round plus one more round using the "full" train dataset;
Thresh - for single-class problems only; numeric vector with the threshold values (one for each round plus the "full" dataset) that maximize the selected evaluation metric;
ConfMat - a list object with confusion matrices generated at each round; for single-class problems this matrix is generated by dichotomizing the probability predictions (into 0,1) using the threshold that optimizes the selected evaluation metric (see 'Thresh' above);
obsTestSet - observed values for the test set (one integer vector for each evaluation round plus the full evaluation round);
predTestSet - predicted values for the test set (one integer or numeric vector for each evaluation round plus the full evaluation round);
TrainSets - a list object with row indices identifying train splits for each test round;
ClassObj - a list containing classifier objects for each round;
ClassParams - classification parameters used for running calibrateClassifier.

Defining custom performance evaluation functions

In argument evalMetric it is possible to define a custom function. This must take two vectors: one containing observed/ground-truth values (first argument) and other with predicted values by the trained classifier (second argument) and both for the test set (from holdout or k-fold CV). If the classification task is single-class (e.g., 1:forest/0:non-forest, 1:water/0:non-water) then the predicted values will be probabilities (ranging in [0,1]) for the interest class (coded as 1's). If the task is multi-class, then the predicted values will be integer codes for each class.

To be considered valid, the evaluation function for single-class must have:

Have at least two inputs arguments (observed and predicted);
Produce a non-null and valid numerical result;
A scalar output;
An attribute named 'thresh' defining the numerical threshold to binarize the classifier predictions (i.e., to convert from continuous probability to discrete 0,1). The calculation of this threshold is necessary to maximize the value of the performance metric instead of using a naive 0.5 cutoff value.

Here goes an example function used to calculate the maximum value for the overall accuracy based on multiple threshold values:


calcMaxAccuracy <- function(obs, pred){

   accuracies <- c()
   i <- 0
   N <- length(obs)
   thresholds <- seq(0, 1, 0.05)

   for(thresh in thresholds){
      i <- i + 1   
      pred_bin <- as.integer(pred > thresh)
      confusionMatrix <- as.matrix(table(obs, pred_bin))
      accuracies[i] <- diag(confusionMatrix) / N
   }

   bestAccuracy <- max(accuracies)
   attr(bestAccuracy, "thresh") <- thresholds[which.max(accuracies)]

   return(bestAccuracy)
}

x <- sample(0:1,100,replace=TRUE)
y <- runif(100)
calcMaxAccuracy(obs = x, pred = y)

Valid multi-class functions' must have:

Have at least two inputs arguments (observed and predicted);
Produce a non-null and valid numerical result;
A scalar output.

An example of a valid custom function to calculate the overall accuracy:


calcAccuracy <- function(obs, pred){

   N <- length(obs)
   confusionMatrix <- as.matrix(table(obs, pred))
   acc <- diag(confusionMatrix) / N
   return(acc)
}

x <- sample(0:1,100,replace=TRUE)
y <- sample(0:1,100,replace=TRUE)
calcAccuracy(obs = x, pred = y)

Note

1) By default, if 25% or more of the calibration/evaluation rounds must produce valid results otherwise the optimization algorithm will return NA.

2) Data balancing is only performed on the train dataset to avoid bias in performance evaluation derived from this procedure.

joaofgoncalves/SegOptim documentation built on Feb. 5, 2024, 11:10 p.m.

joaofgoncalves/SegOptim index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

joaofgoncalves/SegOptim
Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

calibrateClassifier: Train and evaluate a classification algorithm
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

Train and evaluate a classification algorithm

Description

Usage

Arguments

Details

Value

Defining custom performance evaluation functions

Note

Related to calibrateClassifier in joaofgoncalves/SegOptim...

R Package Documentation

Browse R Packages

We want your feedback!

joaofgoncalves/SegOptim Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

calibrateClassifier: Train and evaluate a classification algorithm In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

Train and evaluate a classification algorithm

Description

Usage

Arguments

Details

Value

Defining custom performance evaluation functions

Note

Related to calibrateClassifier in joaofgoncalves/SegOptim...

R Package Documentation

Browse R Packages

We want your feedback!

joaofgoncalves/SegOptim
Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)

calibrateClassifier: Train and evaluate a classification algorithm
In joaofgoncalves/SegOptim: Optimization of Image Segmentation Parameters for Object-Based Image Analysis (OBIA)