Genetic algorithm for variable subset selection
Description
A genetic algorithm to find "good" variable subsets based on internal PLS evaluation or a user specified evaluation function
Usage
1  genAlg(y, X, control, evaluator = evaluatorPLS(), seed)

Arguments
y 
The numeric response vector of length n 
X 
A n x p numeric matrix with all p covariates 
control 
Options for controlling the genetic algorithm. See 
evaluator 
The evaluator used to evaluate the fitness of a variable subset. See

seed 
Integer with the seed for the random number generator or NULL to automatically seed the RNG 
Details
The GA generates an initial "population" of populationSize
chromosomes where each initial
chromosome has a random number of randomly selected variables. The fitness of every chromosome is evaluated by
the specified evaluator. The default builtin PLS evaluator (see evaluatorPLS
) is the preferred
evaluator.
Chromosomes with higher fitness have higher probability of mating with another chromosome. populationSize / 2
couples each create
2 children. The children are created by randomly mixing the parents' variables. These children make up the new generation and are again
selected for mating based on their fitness. A total of numGenerations
generations are built this way.
The algorithm returns the last generation as well as the best elitism
chromosomes from all generations.
Value
An object of type GenAlg
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  ctrl < genAlgControl(populationSize = 100, numGenerations = 15, minVariables = 5,
maxVariables = 12, verbosity = 1)
evaluatorSRCV < evaluatorPLS(numReplications = 2, innerSegments = 7, testSetSize = 0.4,
numThreads = 1)
evaluatorRDCV < evaluatorPLS(numReplications = 2, innerSegments = 5, outerSegments = 3,
numThreads = 1)
# Generate demodata
set.seed(12345)
X < matrix(rnorm(10000, sd = 1:5), ncol = 50, byrow = TRUE)
y < drop(1.2 + rowSums(X[, seq(1, 43, length = 8)]) + rnorm(nrow(X), 1.5));
resultSRCV < genAlg(y, X, control = ctrl, evaluator = evaluatorSRCV, seed = 123)
resultRDCV < genAlg(y, X, control = ctrl, evaluator = evaluatorRDCV, seed = 123)
subsets(resultSRCV, 1:5)
subsets(resultRDCV, 1:5)
