tunePareto: Generic function for multi-objective parameter tuning of...

View source: R/tunePareto.R

tuneParetoR Documentation

Generic function for multi-objective parameter tuning of classifiers

Description

This generic function tunes parameters of arbitrary classifiers in a multi-objective setting and returns the Pareto-optimal parameter combinations.

Usage

tunePareto(..., data, labels, 
           classifier, parameterCombinations,
           sampleType = c("full","uniform",
                          "latin","halton",
                          "niederreiter","sobol",
                          "evolution"), 
           numCombinations, 
           mu=10, lambda=20, numIterations=100,
           objectiveFunctions, objectiveBoundaries,
           keepSeed = TRUE, useSnowfall = FALSE, verbose=TRUE)

Arguments

data

The data set to be used for the parameter tuning. This is usually a matrix or data frame with the samples in the rows and the features in the columns.

labels

A vector of class labels for the samples in data.

classifier

A TuneParetoClassifier wrapper object containing the classifier to tune. A number of state-of-the-art classifiers are included in TunePareto (see predefinedClassifiers). Custom classifiers can be employed using tuneParetoClassifier.

parameterCombinations

If not all combinations of parameter ranges for the classifier are meaningful, you can set this parameter instead of specifying parameter values in the ... argument. It holds an explicit list of possible combinations, where each element of the list is a named sublist with one entry for each parameter.

sampleType

Determines the way parameter configurations are sampled.

If type="full", all possible combinations are tried. This is only possible if all supplied parameter ranges are discrete or if the combinations are supplied explicitly in parameterCombinations.

If type="uniform", numCombinations combinations are drawn uniformly at random.

If type="latin", Latin Hypercube sampling is applied. This is particularly encouraged when tuning using continuous parameters.

If type="halton","niederreiter","sobol", numCombinations parameter combinations are drawn on the basis of the corresponding quasi-random sequences (initialized at a random step to ensure that different values are drawn in repeated runs). This is particularly encouraged when tuning using continuous parameters. type="niederreiter" and type="sobol" require the gsl package to be installed.

If type="evolution", an evolutionary algorithm is applied. In details, this employs mu+lambda Evolution Strategies with uncorrelated mutations and non-dominated sorting for survivor selection. This is encouraged when the space of possible parameter configurations is very large. For smaller parameter spaces, the above sampling methods may be faster.

numCombinations

If this parameter is set, at most numCombinations randomly chosen parameter configurations are tested. Otherwise, all possible combinations of the supplied parameter ranges are tested.

mu

The number of individuals used in the Evolution Strategies if type="evolution".

lambda

The number of offspring per generation in the Evolution Strategies if type="evolution".

numIterations

The number of iterations/generations the evolutionary algorithm is run if type="evolution".

objectiveFunctions

A list of objective functions used to tune the parameters. There are a number of predefined objective functions (see predefinedObjectiveFunctions). Custom objective functions can be created using createObjective.

objectiveBoundaries

If this parameter is set, it specifies boundaries of the objective functions for valid solutions. That is, each element of the supplied vector specifies the upper or lower limit of an objective (depending on whether the objective is maximized or minimized). Parameter combinations that do not meet all these restrictions are not included in the result set, even if they are Pareto-optimal. If only some of the objectives should have bounds, supply NA for the remaining objectives.

keepSeed

If this is true, the random seed is reset to the same value for each of the tested parameter configurations. This is an easy way to guarantee comparability in randomized objective functions. E.g., cross-validation runs of the classifiers will all start with the same seed, which results in the same partitions.

Attention: If you set this parameter to FALSE, you must ensure that all configuration are treated equally in the objective functions: There may be randomness in processes such as classifier training, but there should be no random difference in the rating itself. In particular, the choice of subsets for subsampling experiments should always be the same for all configurations. For example, you can provide precalculated fold lists to the cross-validation objectives in the foldList parameter. If parameter configurations are rated under varying conditions, this may yield over-optimistic or over-pessimistic ratings for some configurations due to outliers.

useSnowfall

If this parameter is true, the routine loads the snowfall package and processes the parameter configurations in parallel. Please note that the snowfall cluster has to be initialized properly before running the tuning function and stopped after the run.

verbose

If this parameter is true, status messages are printed. In particular, the algorithm prints the currently tested combination.

...

The parameters of the classifier and predictor functions that should be tuned. The names of the parameters must correspond to the parameters specified in classifierParameterNames and predictorParameterNames of tuneParetoClassifier. Each supplied argument describes the possible values of a single parameter. These can be specified in two ways: Discrete parameter ranges are specified as lists of possible values. Continous parameter ranges are specified as intervals using as.interval. The algorithm then generates combinations of possible parameter values. Alternatively, the combinations can be defined explicitly using the parameterCombinations parameter.

Details

This is a generic function that allows for parameter tuning of a wide variety of classifiers. You can either specify the values or intervals of tuned parameters in the ... argument, or supply selected combinations of parameter values using parameterCombinations. In the first case, combinations of parameter values specified in the ... argument are generated. If sampleType="uniform", sampleType="latin", sampleType="halton", sampleType="niederreiter" or sampleType="sobol", a random subset of the possible combinations is drawn. If sampleType="evolution", random parameter combinations are generated and optimized using Evolution Strategies.

In the latter case, only the parameter combinations specified explicitly in parameterCombinations are tested. This is useful if certain parameter combinations are invalid. You can create parameter combinations by concatenating results of calls to allCombinations. Only sampleType="full" is allowed in this mode.

For each of the combinations, the specified objective functions are calculated. This usually involves training and testing a classifier. From the resulting objective values, the non-dominated parameter configurations are calculated and returned.

The ... argument is the first argument of tunePareto for technical reasons (to prevent partial matching of the supplied parameters with argument names of tunePareto. This requires all arguments to be named.

Value

Returns a list of class TuneParetoResult with the following components:

bestCombinations

A list of Pareto-optimal parameter configurations. Each element of the list consists of a sub-list with named elements corresponding to the parameter values.

bestObjectiveValues

A matrix containing the objective function values of the Pareto-optimal configurations in bestCombinations. Each row corresponds to a parameter configuration, and each column corresponds to an objective function.

testedCombinations

A list of all tested parameter configurations with the same structure as bestCombinations.

testedObjectiveValues

A matrix containing the objective function values of all tested configurations with the same structure as bestObjectiveValues.

dominationMatrix

A Boolean matrix specifying which parameter configurations dominate each other. If a configuration i dominates a configuration j, the entry in the ith row and the jth column is TRUE.

minimizeObjectives

A Boolean vector specifying which of the objectives are minimization objectives. This is derived from the objective functions supplied to tunePareto.

additionalData

A list containing additional data that may have been returned by the objective functions. The list has one element for each tested parameter configuration, each comprising one sub-element for each objective function that returned additional data. The structure of these sub-elements depends on the corresponding objective function. For example, the predefined objective functions (see predefinedObjectiveFunctions) save the trained models here if saveModel is true.

See Also

predefinedClassifiers, predefinedObjectiveFunctions, createObjective, allCombinations

Examples



# tune 'k' of a k-NN classifier 
# on two classes of the 'iris' data set --
# see ?knn
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.knn(), 
                 k = c(1,3,5,7,9),
                 objectiveFunctions = list(cvError(10, 10),
                                           reclassError())))
                 
# example using predefined parameter configurations,
# as certain combinations of k and l are invalid:
comb <- c(allCombinations(list(k=1,l=0)),
          allCombinations(list(k=3,l=0:2)),
          allCombinations(list(k=5,l=0:4)),
          allCombinations(list(k=7,l=0:6)))

print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.knn(), 
                 parameterCombinations = comb,
                 objectiveFunctions = list(cvError(10, 10),
                                           reclassError())))
                                           

# tune 'cost' and 'kernel' of an SVM on
# the 'iris' data set using Latin Hypercube sampling --
# see ?svm and ?predict.svm
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.svm(), 
                 cost = as.interval(0.001,10), 
                 kernel = c("linear", "polynomial",
                          "radial", "sigmoid"),
                 sampleType="latin",
                 numCombinations=20,                          
                 objectiveFunctions = list(cvError(10, 10),
                                           cvSensitivity(10, 10, caseClass="setosa"))))

# tune the same parameters using Evolution Strategies
print(tunePareto(data = iris[, -ncol(iris)], 
                 labels = iris[, ncol(iris)],
                 classifier = tunePareto.svm(), 
                 cost = as.interval(0.001,10), 
                 kernel = c("linear", "polynomial",
                          "radial", "sigmoid"),
                 sampleType="evolution",
                 numCombinations=20,
                 numIterations=20,                      
                 objectiveFunctions = list(cvError(10, 10),
                                           cvSensitivity(10, 10, caseClass="setosa"),
                                           cvSpecificity(10, 10, caseClass="setosa"))))


TunePareto documentation built on Oct. 2, 2023, 5:06 p.m.