Genetic algorithm combined with PLS regression (GA-PLS)
A subset search algorithm inspired by biological evolution theory and natural selection.
ga_pls(y, X, GA.threshold = 10, iters = 5, popSize = 100)
vector of response values (
the change for a zero for mutations and initialization (default = 10).
the number of iterations (default = 5).
the population size (default = 100).
1. Building an initial population of variable sets by setting bits for each variable randomly, where bit '1' represents selection of corresponding variable while '0' presents non-selection. The approximate size of the variable sets must be set in advance.
2. Fitting a PLSR-model to each variable set and computing the performance by, for instance, a leave one out cross-validation procedure.
3. A collection of variable sets with higher performance are selected to survive until the next "generation".
4. Crossover and mutation: new variable sets are formed 1) by crossover of selected variables between the surviving variable sets, and 2) by changing (mutating) the bit value for each variable by small probability.
5. The surviving and modified variable sets form the population serving as input to point 2.
Returns a vector of variable numbers corresponding to the model having lowest prediction error.
Tahir Mehmood, Kristian Hovde Liland, Solve Sæbø.
K. Hasegawa, Y. Miyashita, K. Funatsu, GA strategy for variable selection in QSAR studies: GA-based PLS analysis of calcium channel antagonists, Journal of Chemical Information and Computer Sciences 37 (1997) 306-310.
data(gasoline, package = "pls") # with( gasoline, ga_pls(octane, NIR, GA.threshold = 10) ) # Time-consuming
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.