README.md

GA

CRAN status R build
status Coverage Status

Overview

GA is an exploration into the development of a customizable package for genetic algorithm with the goal of solving multivariate linear regression at speeds faster than step-wise methods. The main function and supporting functions are:

GA requires the stats, tesstthat, and assertthat packages**; these should automatically be installed and loaded when installing the ‘GA’ pacakge. A in-depth paper on the development process can be found here.

Users can enter a custom function or select a metric from (AIC, BIC, AICC, R2) to optimize features to include in the linear regression. This function has a broad range of genetic algorithm features including multiple parent section methods, crossover options, mutation options and other features such as elitism, minimizing-inbreeding, using more than two parents and selecting from a range of early termination options.

Installation

# The easiest way to get dplyr is to clone install the 'GA' package with the tar.gz file:
install.packages('GA_0.1.1.tgz', repos = NULL, type ='source')

# Alternatively, install using the command line:
R CMD INSTALL 'GA_0.1.1.tgz'

# install directly through github
devtools::install_github("AndrewM1130/GA")

Steps for using select()

  1. Load response & covariate variables
  2. Required parameters without default values are gene length, desired number of generations, and population size for each generation
  3. Tune & add additional parameters, which include:
Parameter Variable Name Available Options Crossover Method crossover (‘uniform’,‘fitness’,‘k_point’) Parent Selection Method metric (‘roulette’,‘rank’,‘tournament’,‘sus’) Mutation Method mutation_rate (‘fixed’,‘adaptive’) Number of Parents number_of_parents integer value Elitism & Diversity elitism & minimize_inbreeding TRUE/FALSE Termination Conditions ‘pause_length’, ‘score_threshold’, ‘percent_converge’) integer values

A comprehensive list of available options for each parameter and their default values within the select() function can be found within the documentation manual.

In the following example, gene_length is set to 20 to match the number of independent variables. Population size is set to 25 & we set AIC as the metric and use uniform crossover and rank based parent selection. Moreover, we’ve set early termination criteria that if our estimator (mean) remains unchanged for 4 iterations the program will terminate.

## example deployment for covariate-weight estimation 
response_vec <- rnorm(100)
independent_vars <- matrix(rnorm(100*20),ncol=20)
gene_length <- 20
pop <- 25
total_number_generations <- 50
metric <- 'AIC'
crossover <- 'uniform'
method <- 'rank'
estimator <-'Mean'
pause_length <- 4

select(total_number_generations = total_number_generations,
     response_vec = response_vec,
     independent_vars = independent_vars,
     pop = pop,
     gene_length = gene_length,
     metric = metric,
     crossover = crossover,
     method = method,
     estimator = estimator,
     pause_length = pause_length)

GA::select() Parameters

The following section contains clarifications and an overview of common parameter changes when developing and testing genetic algorithms.

Custom user-genes & fitness functions

Parent selection method

Available Crossover Methods

Mutation parameters

For all mutation methods, candidate offspring are accepted/rejected so that they don’t have completely 0-vector genes.

‘Inbreeding’ & Elitism

Early Termination Conditions

Getting Help

You can read more in our functions’ documentation using ?function (such as ?select). If you encounter a clear bug, please file an issue with a minimal reproducible example on GitHub. For questions and other discussion, please reach out to me via email or direct messaging!



AndrewM1130/GA documentation built on July 9, 2022, 11:43 a.m.