select: Selects Regression Variables using a Genetic Algorithm

Description Usage Arguments Value See Also Examples

View source: R/select.R

Description

Recommends regression variables by maximizing a fitness criteria using genetic algorithms

Usage

1
2
3
select(x, y, model = list("glm"), fitMetric = "AIC", maxGen = 200L,
  minGen = 10L, gaMethod = list("TN", 5), pop = 100L, pMutate = 0.1,
  crossParams = c(0.8, 1L), eliteRate = 0.1)

Arguments

x

matrix of dimension n * p

y

vector of length n or a matrix with n rows

model

list - default "glm" : one of ("lm", "glm") and an optional character string specifying arguments into lm.fit() or glm.fit()

fitMetric

default "AIC": one of ("AIC", "BIC") or a function that takes a regression object and outputs a single number to be maximized

maxGen

default 200: integer specifying the maximum number of GA generations to use

minGen

default 10: integer specifying the number of generations without fitness improvement at which the GA algorithm will stop

gaMethod

list - default 'LR': one of ('TN', 'LR', 'ER','RW') and an additional numrical argument as needed. See gaSelection for details.

pop

default 100: integer specifying the size of the genotype pool.

pMutate

default 0.1: real number between 0 and 1 specifying the probability of an allele mutation

crossParams

numeric - default (.8, 1): c("cross probability", "max number of cross locations on a single gene")

eliteRate

default 0.1: Proportion of highest fitness genotypes that pass into the next generation unchanged.

Value

returns a list of 4 components: optimum, fitPlot, fitStats, and GA

See Also

regress mate evolve

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
x <- as.matrix(read.table("data/baseball.dat", header = TRUE))[, -1]
y <- as.matrix(read.table("data/baseball.dat", header = TRUE))[, 1]

# linear regression using roulette wheel parent selection
GA <- select(X, Y, model = list("lm"), gaMethod = list("RW"))

# to return just the selected regression variables
GA$optimum$variables

# to return the regression object using the selected variables
GA$optimum$fitModel

# generalized linear regression with binomial family using tournament selection
GA <- select(X, Y, model = list("glm", "family = poisson()"))

# code for generated data linear regression example
x <- as.matrix(read.table("./data/LRdataTest"), header = TRUE)[, -1]
y <- as.matrix(read.table("./data/LRdataTest"), header = TRUE)[, 1]
n = 50
out <- sapply(1:n, FUN = function {select(x, y)$optimum})
coeffs <- sapply(seq(3, 3*n, 3), FUN = function(i) out[[i]]$coefficients)
weights <- c(unlist(sapply(1:n, FUN = function(i) coeffs[[i]])))
weights <- sapply(colnames(x), FUN = function(name) sum(abs(weights[names(weights)==name])))
barplot(weights, xlab = "Variables", ylab = "Weights")

vars <- out[[1]]
varCoeffs <- out[[3]]$coefficients

# Code for the baseball dataset example
maxFits <- matrix(0, 4, 4)
maxIters <- matrix(0, 4, 4)
method <- list(list('TN', 5), list('LR'), list('ER', 0.5), list('RW'))
fit <- c("AIC", "BIC")

for (i in 1:4) {
  for (j in 1:2) {
      trial <- GA::select(x, y, model = list("lm"), fitMetric = fit[j], maxGen = 500L, minGen = 50L,
                              gaMethod = method[[i]], pop = 500L, pMutate = 0.1, crossParams = c(0.8, 1L), eliteRate = 0.1)
      iters <- length(trial$GA)
      bestFit <- eval(parse(text = paste0("trial$GA$gen", iters, "$elites[1,1]")))
      maxFits[i,j] <- bestFit
      maxIters[i,j] <- iters
  }
  for (j in 3:4) {
      trial <- GA::select(x, y, model = list("glm"), fitMetric = fit[j-2], maxGen = 500L, minGen = 50L,
                          gaMethod = method[[i]], pop = 500L, pMutate = 0.1, crossParams = c(0.8, 1L), eliteRate = 0.1)
   iters <- length(trial$GA)
   bestFit <- eval(parse(text = paste0("trial$GA$gen", iters, "$elites[1,1]")))
   maxFits[i,j] <- bestFit
   maxIters[i,j] <- iters
   }
}

dchen49/GA documentation built on May 3, 2019, 6:43 p.m.