genalg-tools: Utility functions for selection and mutation in genetic...
In GenAlgo: Classes and Methods to Use Genetic Algorithms for Feature Selection

GenAlg-tools

R Documentation

Utility functions for selection and mutation in genetic algorithms

Description

These functions implement specific forms of mutation and fitness that can be used in genetic algorithms for feature selection.

Usage

simpleMutate(allele, context)
selectionMutate(allele, context)
selectionFitness(arow, context)

Arguments

`allele`	In the `simpleMutate` function, `allele` is a binary vector filled with 0's and 1's. In the `selectionMutate` function, `allele` is an integer (which is silently ignored; see Details).
`arow`	A vector of integer indices identifying the rows (features) to be selected from the `context$dataset` matrix.
`context`	A list or data frame containing auxiliary information that is needed to resolve references from the mutation or fitness code. In both `selectionMutate` and `selectionFitness`, `context` must contain a `dataset` component that is either a matrix or a data frame. In `selectionFitness`, the `context` must also include a grouping factor (with two levels) called `gps`.

Details

These functions represent 'callbacks'. They can be used in the function GenAlg, which creates objects. They will then be called repeatedly (for each individual in the population) each time the genetic algorithm is updated to the next generation.

The simpleMutate function assumes that chromosomes are binary vectors, so alleles simply take on the value 0 or 1. A mutation of an allele, therefore, flips its state between those two possibilities.

The selectionMutate and selectionFitness functions, by contrast, are specialized to perform feature selection assuming a fixed number K of features, with a goal of learning how to distinguish between two different groups of samples. We assume that the underlying data consists of a data frame (or matrix), with the rows representing features (such as genes) and the columns representing samples. In addition, there must be a grouping vector (or factor) that assigns all of the sample columns to one of two possible groups. These data are collected into a list, context, containing a dataset matrix and a gps factor. An individual member of the population of potential solutions is encoded as a length K vector of indices into the rows of the dataset. An individual allele, therefore, is a single index identifying a row of the dataset. When mutating it, we assume that it can be changed into any other possible allele; i.e., any other row number. To compute the fitness, we use the Mahalanobis distance between the centers of the two groups defined by the gps factor.

Value

Both selectionMutate and simpleMutate return an integer value; in the simpler case, the value is guaranteed to be a 0 or 1. The selectionFitness function returns a real number.

Author(s)

Kevin R. Coombes krc@silicovore.com, P. Roebuck proebuck@mdanderson.org

Examples

# generate some fake data
nFeatures <- 1000
nSamples <- 50
fakeData <- matrix(rnorm(nFeatures*nSamples), nrow=nFeatures, ncol=nSamples)
fakeGroups <- sample(c(0,1), nSamples, replace=TRUE)
myContext <- list(dataset=fakeData, gps=fakeGroups)

# initialize population
n.individuals <- 200
n.features <- 9
y <- matrix(0, n.individuals, n.features)
for (i in 1:n.individuals) {
  y[i,] <- sample(1:nrow(fakeData), n.features)
}

# set up the genetic algorithm
my.ga <- GenAlg(y, selectionFitness, selectionMutate, myContext, 0.001, 0.75)

# advance one generation
my.ga <- newGeneration(my.ga)

GenAlgo documentation built on April 12, 2025, 1:44 a.m.