select: Select best regression model using genetic algorithm

Description Usage Arguments Details Value Examples

View source: R/select.R

Description

select implements a genetic algorithm for variable selection in regression and returns the regression model selected by the genetic algorithm.

Usage

1
2
3
4
5
6
7
8
select(
  data,
  response,
  covariates,
  criterion = "AIC",
  family = "gaussian",
  maximize = FALSE
)

Arguments

data

The dataset to perform regression on.

response

A character string of the name of the response variable.

covariates

A character vector of names of the predictor variables (covariates).

criterion

AIC by default, but user can provide their own

family

a character string naming a family function to use in the model (passed to glm) common families include "gaussian" (identity link), "binomial" (logit link), "poisson" (log link)

Details

This implementation of the genetic algorithm uses generation size p = ceiling(1.5*c/2)*2 where c is the length of the chromosomes (i.e. the number of covariates to consider in the model). The parent chromosomes are selected via rank-based selection, where the probability of a chromosome being selected as parent 1 is proportional to its relative rank, = 2r/(p*(p+1)), where r is the relative rank (higher is better). Parent 1 is selected with these probabilities, and parent 2 is selected completely at random. Each chromosome is mutated with probability 1/c, which has been supported by theoretical work and empirical studies. The algorithm will stop when the objective criterion score (AIC by default) converges absolutely, i.e. when the absolute difference between the score from iteration i-1 and the score from iteration i is less than .000001, the algorithm stops and returns the best model from iteration i of the algorithm.

Value

The regression model selected by the genetic algorithm. This is an object of class "glm" and "lm"

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
data <- mtcars
response <- names(mtcars)[1]
covariates <- names(mtcars)[-1]
select(data, response, covariates)

# How to perform logistic regression with select()
response <- "am"
covariates <- c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "gear", "carb")
select(data, response, covariates, family = "binomial")

# You can also use another objective function instead of AIC (default)
response <- "mpg"
covariates <- c("cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am")
select(data, response, covariates, criterion="BIC")

zihanye96/Genetic_Algorithm documentation built on May 25, 2020, 3:51 p.m.