select: Variable selection using genetic algorithms

Description Usage Arguments Details Examples

Description

select implements genetic algorithms for variable selection for GLMs by optimizing package or user specified objective functions such as AIC, BIC, and logloglikelihood. Uses functions: generate_founders, evaluate_fitness, and create_next_generation. Functions find optimal variables by using evolutationry biology concepts of natural selection, fitness, genetic crossover, and mutation. Founding generation of chromosomes is randomly generated and evaluated using an critieria such as AIC, BIC, or loglihood. Parents are selected by their fitness, and generate children chromosomes. As each generation breeds and produces new genreations, the algorithm moves towards the optimum.

Usage

1
2
3
4
5
6
select(Y, X, family = "gaussian", objective_function = stats::AIC,
  crossover_parents_function = crossover_parents,
  crossover_method = c("method1", "method2", "method3"), pCrossover = 0.8,
  start_chrom = NULL, mutation_rate = NULL, converge = TRUE,
  tol = 5e-04, iter = 100, minimize = TRUE, nCores = 1L,
  verbose = TRUE)

Arguments

Y

vector of response variable

X

a matrix or dataframe of predictor variables

family

a character string describing the error distribution and link function to be used in the model. Default is gaussian.

objective_function

function for computing objective. Default is AIC. User can specify custom function.

crossover_parents_function

a function for crossover between mate pairs. User can specify custom function. Default is crossover_parents.

crossover_method

a character string describing crossover method. Default is multi-point crossover. See crossover_parents.

pCrossover

a numeric value for he probability of crossover for each mate pair.

start_chrom

a numeric value for the size of the popuation of chromosomes. Default is choose(C, 2) ≤ 200, where C is number of predictors.

mutation_rate

a numeric value for rate of mutation. Default is 1 / (P √ C), where P is number of chromosomes, and C is number of predictors.

converge

a logical value indicating whether algorithm should attempt to converge or run for specified number of iterations. If TRUE, convergence will occur when highest ranked chromosomes is equal to mean of top 50% in current and previous generation.

tol

a numeric value indicating convergence tolerance. Default is 1e-4.

iter

an integer specifying maximum number of generations algorithm will produce. Default is 100

minimize

a logical value indicating whether objective function should be minimized (TRUE) or maximized (FALSE).

nCores

an integer indicating number of parallel processes to run when evaluating fitness. Default is 1, or no paralleization. See evaluate_fitness.

verbose

a logical value indicating whether should cat status updates to to terminal. Default is TRUE.

If user wants to use custom objective_function, they must use a function that is compatible with lm or glm fitted objects which output a numberic value of length 1.

Details

1. Geof H. Givens, Jennifer A. Hoeting (2013) Combinatorial Optimization (italicize). Chapter 3 of Computational Statistics (italicize).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Simulated data
rm(list = ls())

set.seed(1111)

# simulate data for gaussian GLM
library(simrel)
library(GA)

n <- 100 # number obs
p <- 10 # number predictors
m <- 2 # number relevant latent components
q <- 5 # number relevant predictors
gamma <- 0.2 # speed of decline in eigenvalues
R2 <- 0.5 # theoretical R-squared according to the true linear model
relpos <- base::sample(1:p, m, replace = FALSE) # positions of m
dat <- simrel::simrel(n, p, m, q, relpos, gamma, R2) # generate data
x <- dat$X
y <- dat$Y

## Not run: sim_GA <- GA:select(y, x, family = "gaussian", objective_function = stats::AIC,
crossover_method = "method1", pCrossover = 0.8, converge = TRUE, minimize = TRUE, nCores = 1)
## End(Not run)

# mtcars
data(mtcars)

y <- mtcars$mpg
x <- mtcars[, 2:11]

## Not run: GA_mtcars <- GA:select(y, x, family = "gaussian", objective_function = stats::AIC,
crossover_method = "method1", pCrossover = 0.8, converge = TRUE, minimize = TRUE, nCores = 1)
## End(Not run)

adams-cam/GA documentation built on May 10, 2019, 9:28 a.m.