precalculation: Predefined precalculation functions for objectives

precalculationR Documentation

Predefined precalculation functions for objectives

Description

These predefined precalculation functions can be employed to create own objectives using createObjective. They perform a reclassification or a cross-validation and return the true labels and the predictions.

Usage

reclassification(data, labels, 
                 classifier, classifierParams, predictorParams)

crossValidation(data, labels, 
                classifier, classifierParams, predictorParams, 
                ntimes = 10, nfold = 10, 
                leaveOneOut = FALSE, stratified = FALSE,
                foldList = NULL)

Arguments

data

The data set to be used for the precalculation. This is usually a matrix or data frame with the samples in the rows and the features in the columns.

labels

A vector of class labels for the samples in data.

classifier

A TuneParetoClassifier wrapper object containing the classifier to tune. A number of state-of-the-art classifiers are included in TunePareto (see predefinedClassifiers). Custom classifiers can be employed using tuneParetoClassifier.

classifierParams

A named list of parameter assignments for the training routine of the classifier.

predictorParams

If the classifier consists of separate training and prediction functions, a named list of parameter assignments for the predictor function.

nfold

The number of groups of the cross-validation. Ignored if leaveOneOut=TRUE.

ntimes

The number of repeated runs of the cross-validation. Ignored if leaveOneOut=TRUE.

leaveOneOut

If this is true, a leave-one-out cross-validation is performed, i.e. each sample is left out once in the training phase and used as a test sample

stratified

If set to true, a stratified cross-validation is carried out. That is, the percentage of samples from different classes in the cross-validation folds corresponds to the class sizes in the complete data set. If set to false, the folds may be unbalanced.

foldList

If this parameter is set, the other cross-validation parameters (ntimes, nfold, leaveOneOut, stratified) are ignored. Instead, the precalculated cross-validation partition supplied in foldList is used. This allows for using the same cross-validation experiment in multiple tunePareto calls. Partitions can be generated using generateCVRuns.

Details

reclassification trains the classifier with the full data set. Afterwards, the classifier is applied to the same data set.

crossValidate partitions the samples in the data set into a number of groups (depending on nfold and leaveOneOut). Each of these groups is left out once in the training phase and used for prediction. The whole procedure is repeated several times (as specified in ntimes).

Value

reclassification returns a list with the following components:

trueLabels

The original labels of the dataset as supplied in labels

predictedLabels

A vector of predicted labels of the data set

model

The TuneParetoModel object resulting from the classifier training

crossValidation returns a nested list structure. At the top level, there is one list element for each run of the cross-validation. Each of these elements consists of a list of sub-structures for each fold. The sub-structures have the following components:

trueLabels

The original labels of the test samples in the fold

predictedLabels

A vector of predicted labels of the test samples in the fold

model

The TuneParetoModel object resulting from the classifier training in the fold

That is, for a cross-validation with n runs and m folds, there are n top-level lists, each having m sub-lists comprising the true labels and the predicted labels.

See Also

createObjective, generateCVRuns.

Examples


# create new objective minimizing the
# false positives of a reclassification

cvFalsePositives <- function(nfold=10, ntimes=10, leaveOneOut=FALSE, foldList=NULL, caseClass)
{
  return(createObjective(
            precalculationFunction = "crossValidation",
            precalculationParams = list(nfold=nfold, 
                                        ntimes=ntimes, 
                                        leaveOneOut=leaveOneOut,
                                        foldList=foldList),
            objectiveFunction = 
            function(result, caseClass)
            {
             
              # take mean value over the cv runs
              return(mean(sapply(result,
                    function(run)
                    # iterate over runs of cross-validation
                    {
                      # extract all predicted labels in the folds
                      predictedLabels <- 
                            unlist(lapply(run,
                                         function(fold)fold$predictedLabels))
    
                      # extract all true labels in the folds
                      trueLabels <- 
                            unlist(lapply(run,
                                          function(fold)fold$trueLabels))
                      
                      # calculate number of false positives in the run
                      return(sum(predictedLabels == caseClass & 
                                 trueLabels != caseClass))
                    })))
            },
            objectiveFunctionParams = list(caseClass=caseClass),
            direction = "minimize",        
            name = "CV.FalsePositives"))                  
}

# use the objective in an SVM cost parameter tuning on the 'iris' data set
r <- tunePareto(data = iris[, -ncol(iris)], 
                labels = iris[, ncol(iris)],
                classifier = tunePareto.svm(),
                cost = c(0.001,0.005,0.01,0.05,0.1,0.5,1,5,10,50),
                objectiveFunctions=list(cvFalsePositives(10, 10, caseClass="setosa")))
print(r)


TunePareto documentation built on Oct. 2, 2023, 5:06 p.m.