gareg_subset: Genetic-Algorithm Best Subset Selection (GA / GAISL)

View source: R/gareg_subset.R

gareg_subsetR Documentation

Genetic-Algorithm Best Subset Selection (GA / GAISL)

Description

Runs a GA-based search over variable subsets using a user-specified objective (default: subsetBIC) and returns a compact "gareg" S4 result with method = "subset". The engine can be ga (single population) or gaisl (islands), selected via gaMethod.

Usage

gareg_subset(
  y,
  X,
  ObjFunc = NULL,
  gaMethod = "ga",
  gacontrol = NULL,
  monitoring = FALSE,
  seed = NULL,
  ...
)

Arguments

y

Numeric response vector (length n).

X

Numeric matrix of candidate predictors (n rows by p columns).

ObjFunc

Objective function or its name. Defaults to subsetBIC. The objective must accept as its first argument a binary chromosome (0/1 mask of length p) and may accept additional arguments passed via .... By convention, subsetBIC returns negative BIC, so the GA maximizes fitness.

gaMethod

GA backend to call: "ga" or "gaisl" (functions from package GA), or a GA-compatible function with the same interface as ga.

gacontrol

Optional named list of GA engine controls (e.g., popSize, maxiter, run, pcrossover, pmutation, elitism, seed, parallel, keepBest, monitor, ...). These are passed to the GA engine, not to the objective.

monitoring

Logical; if TRUE, prints a short message and (if not supplied in gacontrol) sets monitor = GA::gaMonitor for live progress.

seed

Optional RNG seed (convenience alias for gacontrol$seed).

...

Additional arguments forwarded to ObjFunc (not to the GA engine). For subsetBIC these typically include family, weights, offset, and control.

Details

The fitness passed to GA is ObjFunc itself. Because the engine expects a function with signature f(chrom, ...), your ObjFunc must interpret chrom as a 0/1 mask over the columns of X. The function then computes a score (e.g., negative BIC) using y, X, and any extra arguments supplied via ....

With the default subsetBIC, the returned value is -BIC, so we set max = TRUE in the GA call to maximize fitness. If you switch to an objective that returns a quantity to minimize, either negate it in your objective or change the engine setting to max = FALSE.

Engine controls belong in gacontrol; objective-specific options belong in .... This separation prevents accidental name collisions between GA engine parameters and objective arguments.

Value

An object of S4 class "gareg" (with method = "subset") containing:

  • call – the matched call.

  • N – number of observations.

  • objFunc – the objective function used.

  • gaMethod"ga" or "gaisl".

  • gaFit – the GA fit object returned by GA (if your class allows it).

  • featureNames – column names of X (or empty).

  • bestFitness – best fitness value (GA::ga@fitnessValue).

  • bestChromc(m, idx): number of selected variables and their indices.

  • bestnumbsolm, number of selected variables.

  • bestsol – vector of selected column indices in X.

See Also

subsetBIC, ga, gaisl

Examples


if (requireNamespace("GA", quietly = TRUE)) {
  set.seed(1)
  n <- 100
  p <- 12
  X <- matrix(rnorm(n * p), n, p)
  y <- 1 + X[, 1] - 0.7 * X[, 4] + rnorm(n, sd = 0.5)

  # Default: subsetBIC (Gaussian – negative BIC), engine = GA::ga
  fit1 <- gareg_subset(y, X,
    gaMethod = "ga",
    gacontrol = list(popSize = 60, maxiter = 80, run = 40, parallel = FALSE)
  )
  summary(fit1)

  # Island model: GA::gaisl
  fit2 <- gareg_subset(y, X,
    gaMethod = "gaisl",
    gacontrol = list(popSize = 40, maxiter = 60, numIslands = 4, parallel = FALSE)
  )
  summary(fit2)

  # Logistic objective (subsetBIC handles GLM via ...):
  ybin <- rbinom(n, 1, plogis(0.3 + X[, 1] - 0.5 * X[, 2]))
  fit3 <- gareg_subset(ybin, X,
    gaMethod = "ga",
    family = stats::binomial(), # <- passed to subsetBIC via ...
    gacontrol = list(popSize = 60, maxiter = 80, parallel = FALSE)
  )
  summary(fit3)
}



GAReg documentation built on March 29, 2026, 5:08 p.m.