Description Usage Arguments Details Examples
Select utilizes the genetic algoritim to optimize model selection for generalized linear models.
Relying on functions: fitness_serial
, selection
, crossover
and mutation
, new populations of chromosomes correspondng to models are generated, selecting based on AIC, until the algorithm is assumed to converge based on user-specified tolerance or maximum iterations are reached.
1 2 3 |
y |
a vector or matrix of responses |
x |
a matrix of covariates |
family |
a description of the error distribution to be used in the glm fitting. Default is to use gaussian. |
k |
size of disjoin subset (must be smaller than P and whose quotient with P is 0, eg; P mod k = 0) |
P |
population size (number of chromosomes to evaluate) |
mutation |
an optional value specifying the rate at which mutation occurs in the new population. |
ncores |
an optional value indicating the number of cores to use in parallelization. Should be numeric. |
fitness_function |
optional function to evaluate model fitness. Must be of type closure. Default is to use AIC. Options: "rank" uses relative rank of models based on AIC. Option: "weight" uses absolute value of AIC to determine probability of reproduction in the preceding generation. This option should be used with caution because it can become stuck at a local minimum, as a single model with very low AIC will have large probability of reproduction. |
fitness |
character in 'rank' or 'weight' used in |
selection |
character in 'fitness' or 'tournament' to select either standard selection |
tol |
Optional value indicating relative convergence tolerance. Should be of class numeric. |
maxIter |
Optional value indicating the maximum number of iterations. Default is 100. |
Burnham, K. P. and D. R. Anderson. 2002. Model Selection and Multimodel Inference. Springer-Verlag, New York
Geof H. Givens, Jennifer A. Hoeting (2013) Combinatorial Optimization (italicize). Chapter 3 of Computational Statistics (italicize).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # GA test with mtcars
rm(list=ls());gc()
set.seed(42L)
# generate data from mtcars
y <- as.matrix(mtcars$mpg)
x <- as.matrix(mtcars[2:11])
## Not run: select(y = y, x = x, k = 2, family = "gaussian", P=10, maxIter = 10)
# GA test with simrel data
library(simrel)
N = 500 # number of observations
p = 100 # number of covariates
q = floor(p/4) # number of relevant predictors
m = q # number of relevant components
ix = sample(x = 1:p,size = m) # location of relevant components
data = simrel(n=N, p=p, m=m, q=q, relpos=ix, gamma=0.2, R2=0.75)
fitness = "rank" # character in 'value' or 'rank'
family = "gaussian"
y = data$Y
x = data$X
## Not run: GAoptim = GA::select(y = y,x = x,family = family,k = 20,P = 100,ncores = 0,fitness = "rank",selection = "tournament")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.