genalg-tools: Utility functions for selection and mutation in genetic...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

These functions implement specific forms of mutation and fitness that can be used in genetic algorithms for feature selection.

Usage

1
2
3
simpleMutate(allele, context)
selectionMutate(allele, context)
selectionFitness(arow, context)

Arguments

allele

In the simpleMutate function, allele is a binary vector filled with 0's and 1's. In the selectionMutate function, allele is an integer (which is silently ignored; see Details).

arow

A vector of integer indices identifying the rows (features) to be selected from the context$dataset matrix.

context

A list or data frame containing auxiliary information that is needed to resolve references from the mutation or fitness code. In both selectionMutate and selectionFitness, context must contain a dataset component that is either a matrix or a data frame. In selectionFitness, the context must also include a grouping factor (with two levels) called gps.

Details

These functions represent 'callbacks'. They can be used in the function GenAlg, which creates objects. They will then be called repeatedly (for each individual in the population) each time the genetic algorithm is updated to the next generation.

The simpleMutate function assumes that chromosomes are binary vectors, so alleles simply take on the value 0 or 1. A mutation of an allele, therefore, flips its state between those two possibilities.

The selectionMutate and selectionFitness functions, by contrast, are specialized to perform feature selection assuming a fixed number K of features, with a goal of learning how to distinguish between two different groups of samples. We assume that the underlying data consists of a data frame (or matrix), with the rows representing features (such as genes) and the columns representing samples. In addition, there must be a grouping vector (or factor) that assigns all of the sample columns to one of two possible groups. These data are collected into a list, context, containing a dataset matrix and a gps factor. An individual member of the population of potential solutions is encoded as a length K vector of indices into the rows of the dataset. An individual allele, therefore, is a single index identifying a row of the dataset. When mutating it, we assume that it can be changed into any other possible allele; i.e., any other row number. To compute the fitness, we use the Mahalanobis distance between the centers of the two groups defined by the gps factor.

Value

Both selectionMutate and simpleMutate return an integer value; in the simpler case, the value is guaranteed to be a 0 or 1. The selectionFitness function returns a real number.

Author(s)

Kevin R. Coombes krc@silicovore.com, P. Roebuck proebuck@mdanderson.org

See Also

GenAlg, GenAlg-class, maha.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)

# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
  y[i,] <- sample(1:nrow(fakeData), n.features)
}

# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)

# advance one generation
my.ga <- newGeneration(my.ga)

GenAlgo documentation built on Oct. 23, 2020, 7:28 p.m.