select: Genetic Algorithm for Model Selection

Description Usage Arguments Value Examples

View source: R/select.R

Description

This is main call function to run package GA. This package is comprised of a main execution file (select.R) and other R files comtaining the utilities functions called for execution. The user can enter enter a dependent variable and a dataset to execute this function.

Usage

1
2
3
select(y, dataset, reg_method = NULL, n_iter = 200, pop_size = 2 * n, objective = "AIC",
interaction = F, most_sig = F, parent_selection = "prop", nb_groups = 4, generation_gap = 0.25,
gene_selection = "crossover", nb_pts = 1, mu = 0.3, err = 1e-6)

Arguments

y

(character) Column name of the dependent variable

dataset

(data frame)The dataset in matrix form with last column being the dependent variable.

reg_method

(character) "lm" or "glm". methods for fitting the data (default "lm")

n_iter

(int) The maximum number of iterations allowed when running GA

pop_size

(int) The number of individuals per generation (default 2 * number of covariates).

objective

(character) The objective criterion to use (default "AIC").

interaction

(logical) Whether to add the interaction terms to the independent variables (default F).

most_sig

(logical) Whether to use the most significant variables inside the first_generation function (default F).

parent_selection

(character) The mechanism to select parents. Selection mechanisms are "prop","prop_random", "random" or "tournament".

nb_groups

(int) The number of groups chosen to do using the tournament selection. (default 4)

generation_gap

( numeric) The proportion of the individuals to be replaced by offspring. (default 0.25)

gene_selection

(function) The additional selection method for choosing genes in GA. Refer to gene_selection to see the required inputs and the desired form of output. If left unspecified, the algorithm uses a default function which is controlled using the gene_operator parameter.

gene_operator

If the user doesn't provide his own gene_selection method, then the gene_operator is used. Options are "crossover" and "random"

nb_pts

(int) The number of points that used in crossover (default 1)

mu

(numeric) The mutation rate (default 0.3)

err

(numeric) The convergence threshold (if the difference between last iteration and current is less than err, the algorithm stops) (default 1e-6)

Value

select returns a list with elements:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
select("mpg", mtcars)
select("crim", Boston)
simulation <- function(c, n, beta_0, beta, sigma){
 c: number of variables c = 10
 n: total number of observations
 X <- matrix(rep(round(runif(c, min = 1, max = 10)),n) + rnorm(c*n, mean = 0, sd = 0.2),
             nrow = n, byrow = T)
 X_names <- paste0("X", 1:c)
 X_data <- as.data.frame(X)
 colnames(X_data) <- X_names
 Y <- rowSums(t(beta*t(X))) + beta_0 + rnorm(n, mean = 0, sd = sigma)
 return(cbind(X_data, Y))
 }
 test_data <- simulation(10, 100, 1,sample(c(round(runif(10/2, min = 2, max = 10)), rep(0,5)), replace = F), 1)

 select(names(test_data)[length(names(test_data))], test_data, reg_method="lm", n_iter =200, pop_size = 20, objective = "AIC",
        interaction = F, most_sig = F, parent_selection = "prop", nb_groups = 4, generation_gap = 0.25,
        gene_selection = NULL, gene_operator = "crossover", nb_pts = 1, mu = 0.3, err = 1e-6)

Skjemaa/GA documentation built on Dec. 17, 2017, 6:23 p.m.