select: select()

Description Usage Arguments Value

View source: R/select.R

Description

select() implements a genetic algorithm for variable selection in linear regression and GLMs, it returns the best variables to include in your model as well as an assessment of fitness of that model and the number of generations the genetic algorithm went through before reaching its stopping criteria.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
select(
  covariates,
  response,
  stop_criteria = "threshold",
  stop_point = 3,
  iterations = NULL,
  cross_method = "single",
  cross_type = "nonrandom",
  mut_method = "single",
  mut_rate = min(1/c, 0.01),
  model_function = "lm",
  fitness_method = "AIC",
  FUN = NULL,
  rank_method = NULL,
  ...
)

Arguments

covariates

A matrix, contains the covariates input by the user

response

A vector, contains the response variable input by the user

stop_criteria

A character, accepted values are "threshold" and "iteration", default is "threshold"

stop_point

A number, determines how many previous generations of best fits match as a threshold of when to stop running if stop_criteria = "threshold", default is 3

iterations

A number, determines how many iterations the genetic algorithm will run for if stop_criteria = "iteration", default is null

cross_method

A charater, determines how many crossovers are performed when creating offspring , 'single' - one crossover point and same for all, 'multi' - 2 crossover points and same for all, default is 'single'

cross_type

A character, determines how the crossovers are performed for each offspring, 'nonrandom' means each set of parents is crossed at same point, 'random' means each set of parents is crossed at a random (possibly different) point, default is 'nonrandom'

mut_method

A character, determines if more than one mutation is allowed per individual, single means only one mutation per individual, multi means multiple mutaitons are allowed per individual, default is 'single'

mut_rate

A number, specifies the mutation rate (percentage of mutations) for the genetic algorithm, default is min(1/c, 0.01)

model_function

A character, specifies the model function to use, accepted values are 'lm' and 'glm', default value is 'lm'

fitness_method

A character, specifies the method to assess the fitness of the models, accepted values are 'AIC', 'r_squared', and 'custom', default value is 'AIC'

FUN

A function, allows the user to specify a function such as BIC() to assess the fitness of lm() and glm() models, the function must be able to run on output from lm() or glm(), default value is NULL, if used fitness_method must be set to 'custom'

Value

Returns a list, the first entries in the list are the best variables amongst the covariates to build a model, a 0 indicates that that variable should not be included in your model, while a 1 indicates that that variable should be included, the fitness of the selected model is also included, as well as the model's rank in the current generation, and the generation number, the generation number is the number of generations created in the genetic algorithm, this is equal to the number of iterations the genetic algorithm completed


eldonlk/Genetic_Algorithm documentation built on April 25, 2020, 3:18 p.m.