APES: Perform APES model selection
In kevinwang09/APES: Approximated Exhaustive Search for GLM

apes	R Documentation

Perform APES model selection

Description

Approximated exhaustive selection will be performed on the input GLM model.

Usage

apes(
  model,
  k = NULL,
  estimator = "leaps",
  fitted_values = NULL,
  time_limit = 10L,
  really_big = FALSE,
  verbose = FALSE,
  n_boot = 0,
  workers = 1L
)

Arguments

`model`	A "full" model with all variables of interest fitted to it. Accepts either a "glm" class or "coxph" class object.
`k`	Model size to explore. Default to NULL, which searches through all variables in the model. Alternatively, user can input a vector. If estimator is: "leaps", then models up to size max(k) will be explored. "mio", then models with specified values will be explored.
`estimator`	Either "leaps" (default) or "mio", which correspond to optimisation algorithms available in the leaps and bestsubset package, respectively.
`fitted_values`	Default to NULL, and fitted values are extracted from the fitted model itself. However, users can specify a vector of fitted values to improve the performance of APES as per Wang et. al. 2019.
`time_limit`	The time limit for the maximum time allocated to each model size model when the "mio" estimator was selected. It will not affect the computational speed if "leaps" is selected as the estimator.
`really_big`	If set to FALSE (by default), then it will prevent variable selection greater than 30 variables. We recommend only setting this argument to TRUE if the number of variables exceed 30 while simultaneous setting the "estimator" argument to "mio".
`verbose`	Whether to print off messages during computations.
`n_boot`	Number of bootstrap runs, default to 0, which doesn't perform any sampling.
`workers`	Number of cores used for parallel processing for the bootstrap

Details

This is the main function of the APES package. Note that the feature dimensions equals to the number of columns if all columns are numeric. If factor/categorical variables are present in the input "glm" object, then the feature dimension also includes all the levels of factors. See vignette on birth weight.

Value

Either an object of class "apes" in case "n_boot" is set to zero or an object of class "boot_apes" in case "n_boot" is set to a positive integer.

Examples

set.seed(10)
n = 100
p = 10
beta = c(5, -5, rep(0, p-2))
x = matrix(rnorm(n*p), ncol = p)
colnames(x) = paste0("X", 1:p)

## Logistic regression example
y = rbinom(n = n, size = 1, prob = expit(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "binomial")
apes_result = apes(model = model)
apes_result
class(apes_result)
names(apes_result)

## Poisson regression example
y = rpois(n = n, lambda = exp(x %*% beta))
data = data.frame(y, x)
model = glm(y ~ ., data = data, family = "poisson")
apes(model = model)

## Bootstrap examples
apes(model = model, n_boot = 2)
apes(model = model, n_boot = 2, workers = 1)
## apes(model = model, n_boot = 2, workers = 2)

## Cox regression example
hx = exp(x %*% beta)
time = rexp(n, hx)
censor = rbinom(n = n, prob = 0.3, size = 1) # censoring indicator
data = data.frame(x, time = time, censor = censor)
model = survival::coxph(survival::Surv(time, censor) ~ ., data = data)
apes(model = model)

kevinwang09/APES documentation built on Nov. 10, 2023, 12:09 p.m.