GA: Genetic Algorithms for variable selection in classification

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

A set of functions implementing simple variable selection in classification applications using genetic algorithms.

Usage

1
2
3
4
5
6
7
8
9
GAfun(X, C, eval.fun, kmin, kmax, popsize = 20, niter = 50,
      mut.prob = 0.05, ...)
GAfun2(X, C, eval.fun, kmin, kmax, popsize = 20, niter = 50,
       mut.prob = 0.05, ...)

GA.init.pop(popsize, nvar, kmin, kmax)
GA.select(pop, number, qlts, min.qlt = 0.4, qlt.exp = 1)
GA.mut(subset, maxvar, mut.prob = 0.01)
GA.XO(subset1, subset2)

Arguments

X

Data matrix: independent variables used by eval.fun

C

Class vector, used by eval.fun

eval.fun

evaluation function. Should take a data matrix, a class vector (or factor), and a subset argument

kmin

Minimal number of variables to retain

kmax

Maximal number of variables to retain

popsize

Size of the GA population

niter

Number of iterations

mut.prob

Mutation probability

...

Further arguments to the evaluation function

nvar

The total number of variables to choose from

pop, subset, subset1, subset2

A (part of a) population of trial solutions

number

The number of trial solutions that may produce offspring

qlts

Vector of quality measures for members in a population

min.qlt

Minimal quality of a trial solution to be considered as a future parent

qlt.exp

Quality scaling parameter: the larger this number, the more discrimination between good and bad solutions, and the more greedy the search characteristics

maxvar

Number of variables to choose from

Details

The function generates a population of trial solutions, each containing a number of variables to be retained. For every member of the population, the evaluation function calculates a quality measure, which determines the chance of that member to create offspring. In a process of "survival of the fittest", this leads to subsets for which the evaluation function has a maximal value.

The initialization is done randomly. Selection is simple threshold selection. Mutation swaps variables in or out of the subset; the cross-over type is uniform. Functions GA.init.pop, GA.select, GA.mut and GA.XO are auxiliary functions, not meant to be called directly by the user.

Value

Functions GAfun and GAfun2 both return a list containing the following fields:

best

The best subset

best.q

The quality of the best subset

n.iter

The number of iterations

In addition, the outcome of GAfun2 also contains

qualities

A matrix containing the best, median and worst quality value throughout the optimization

Author(s)

Ron Wehrens

References

R. Wehrens. "Chemometrics with R - Multivariate Data Analysis in the Natural Sciences and Life Sciences". Springer, Heidelberg, 2011.

See Also

Evaluation, SA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if (require("pls")) {
  data(gasoline, package = "pls")
  ## Usually more iterations are needed
  GAobj <- GAfun(gasoline$NIR, gasoline$octane,
                 eval.fun = pls.cvfun, niter = 20,
                 kmin = 3, kmax = 25, ncomp = 2)
  GAobj
} else {
  cat("Package pls not available.\nInstall it by typing 'install.packages(\"pls\")'")
}

Example output

Attaching package: 'ChemometricsWithR'

The following objects are masked from 'package:stats':

    loadings, screeplot

Loading required package: pls

Attaching package: 'pls'

The following objects are masked from 'package:ChemometricsWithR':

    loadingplot, loadings, scoreplot

The following object is masked from 'package:stats':

    loadings

$best
 [1] 242  45 130  38 290  19 176 298  83 257 223 110 239 335 124 172 374 222 162

$best.q
[1] -0.03915816

$n.iter
[1] 20

ChemometricsWithR documentation built on May 2, 2019, 10:25 a.m.