gagam: Genetic Algorithm for Generalized Additive Models

Description Usage Arguments Value References Examples

View source: R/gagam.R

Description

Implements the genetic algorithm for simultaneous variable selection and structure discovery in generalized additive models. For a given dependent variable and a set of explanatory variables, the genetic algorithm determines which regressors should be included linearly, which nonparametrically, and which should be excluded from the regression equation. The aim is to minimize the Bayesian Information Criterion value of the model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
gagam(
  y,
  x,
  pop_size = 500,
  Kvar = 15,
  Kint = 0,
  no_gen = 100,
  p_m = 0.05,
  p_int = 0.1,
  p_nonpar = 0.1,
  p_int_nonpar = 0.1,
  multicore = TRUE,
  cores = NULL,
  k = 10,
  bs = "cr",
  family = gaussian(),
  method = "REML",
  optimizer = c("outer", "newton"),
  reduc = NULL,
  always_par = NULL
)

Arguments

y

Vector, matrix, data frame, or factor containing observations of the dependent variable.

x

Matrix or data frame containing all considered explanatory variables. If the columns have names those will be used for variable names in the final output.

pop_size

Size of the population (needs to be a multiple of 500). Default is 500.

Kvar

Maximum number of variables allowed in the final model. Default is 15.

Kint

Maximum number of interactions allowed in the final model. Default is 0.

no_gen

Number of generations until convergence. Default is 100.

p_m

Mutation rate for variables. Default is 0.05.

p_int

Mutation rate for interactions of variables. Default is 0.1.

p_nonpar

Mutation rate for the linear/nonparametric indicators for variables. Default is 0.1.

p_int_nonpar

Mutation rate for the linear/nonparametric indicators for interactions. Default is 0.1.

multicore

Whether to use multiple cores in computation. Strongly recommended but may not work on Windows. Default is TRUE.

cores

Number of cores to use with multicore. Default (NULL) uses all cores.

k

Basis dimension for nonparametric terms estimated using splines. Default is 10.

bs

Spline basis for nonparametric terms. Specified as a two letter character string. Default is the natural cubic spline, bs="cr". See smooth.terms for an overview of what is available.

family

Specifies the family for the gam (see family and family.mgcv). Default is gaussian().

method

Specifies the metric for smoothing parameter selection (see gam). Default is "REML".

optimizer

Specifies the numerical optimization algorithm for the gam (see gam). Default is c("outer","newton").

reduc

Implements additional variable elimination methods at the end of the run of the genetic algorithm. User can choose between methods 1, 2, and 3. Multiple methods can be chosen. E.g. reduc=c(1) or reduc=c(1,3). See the GAGAM paper for an explanation of the methods. Default is NULL.

always_par

Vector of the column numbers (in x) of the variables always estimated parametrically (for noncontinuous predictors).

Value

A list containing: gam object (fitted best model), vector of indexes or names of variables included linearly, vector of indexes or names of variables included nonparametrically (and the same lists for interactions if Kint is greater than 0).

References

Cus, Mark. 2020. "Simultaneous Variable Selection And Structure Discovery In Generalized Additive Models". https://github.com/markcus1/gagam/blob/master/GAGAMpaper.pdf.

Examples

1
2
3
4
5
6
7
8
9
N <- 500
set.seed(123)
xdat <- matrix(rnorm(N*10,0,1),nrow=N,ncol=10)
ydat <- 4*xdat[,1]+5*xdat[,2]+6*xdat[,3]+(xdat[,4])^2 + 4 + rnorm(N,0,0.25)

## Not run: 
example_gagam <- gagam(ydat,xdat,Kvar = 6,no_gen = 50)

## End(Not run)

markcus1/gagam documentation built on April 16, 2020, 11:50 a.m.