select: Genetic Algorithm based Variable Selection

Description Usage Arguments Details Examples

Description

Select utilizes the genetic algoritim to optimize model selection for generalized linear models. Relying on functions: fitness_serial, selection, crossover and mutation, new populations of chromosomes correspondng to models are generated, selecting based on AIC, until the algorithm is assumed to converge based on user-specified tolerance or maximum iterations are reached.

Usage

1
2
3
select(y, x, family, k, P, mutation = 0.01, ncores = 0,
  fitness_function = stats::AIC, fitness = "rank", selection = "fitness",
  tol = 5e-04, maxIter = 100L)

Arguments

y

a vector or matrix of responses

x

a matrix of covariates

family

a description of the error distribution to be used in the glm fitting. Default is to use gaussian.

k

size of disjoin subset (must be smaller than P and whose quotient with P is 0, eg; P mod k = 0)

P

population size (number of chromosomes to evaluate)

mutation

an optional value specifying the rate at which mutation occurs in the new population.

ncores

an optional value indicating the number of cores to use in parallelization. Should be numeric.

fitness_function

optional function to evaluate model fitness. Must be of type closure. Default is to use AIC. Options: "rank" uses relative rank of models based on AIC. Option: "weight" uses absolute value of AIC to determine probability of reproduction in the preceding generation. This option should be used with caution because it can become stuck at a local minimum, as a single model with very low AIC will have large probability of reproduction.

fitness

character in 'rank' or 'weight' used in selection or selection_tournament; select between using AIC weight or ranks to evaluate fitness. For general problems (eg; if fitness_function is BIC) using 'rank' is preferred

selection

character in 'fitness' or 'tournament' to select either standard selection selection or tournament selection selection_tournament

tol

Optional value indicating relative convergence tolerance. Should be of class numeric.

maxIter

Optional value indicating the maximum number of iterations. Default is 100.

Details

  1. Burnham, K. P. and D. R. Anderson. 2002. Model Selection and Multimodel Inference. Springer-Verlag, New York

  2. Geof H. Givens, Jennifer A. Hoeting (2013) Combinatorial Optimization (italicize). Chapter 3 of Computational Statistics (italicize).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# GA test with mtcars
rm(list=ls());gc()
set.seed(42L)

# generate data from mtcars
y <- as.matrix(mtcars$mpg)
x <- as.matrix(mtcars[2:11])

## Not run: select(y = y, x = x, k = 2, family = "gaussian", P=10, maxIter = 10)

# GA test with simrel data
library(simrel)

N = 500 # number of observations
p = 100 # number of covariates
q = floor(p/4) # number of relevant predictors
m = q # number of relevant components
ix = sample(x = 1:p,size = m) # location of relevant components
data = simrel(n=N, p=p, m=m, q=q, relpos=ix, gamma=0.2, R2=0.75)

fitness = "rank" # character in 'value' or 'rank'
family = "gaussian"
y = data$Y
x = data$X

## Not run: GAoptim = GA::select(y = y,x = x,family = family,k = 20,P = 100,ncores = 0,fitness = "rank",selection = "tournament")

slwu89/GA documentation built on May 14, 2019, 5:20 p.m.

Related to select in slwu89/GA...