stabsel: Stability selection.

View source: R/stabsel.R

stabselR Documentation

Stability selection.

Description

Performs stability selection based on gradient boosting.

Usage

stabsel(formula, data, family = "gaussian",
  q, maxit, B = 100, thr = .9, fraction = 0.5, seed = NULL, ...)

## Plot selection frequencies.
## S3 method for class 'stabsel'
plot(x, show = NULL,
  pal = function(n) gray.colors(n, start = 0.9, end = 0.3), ...)

Arguments

formula

A formula or extended formula.

data

A data.frame.

family

A bamlss.family object.

q

An integer specifying how many terms to select in each boosting run.

maxit

An integer specifying the maximum number of boosting iterations. See opt_boost. Either choose q or maxit as hyper-parameter for regularization.

B

An integer. The boosting is run B times.

thr

Cut-off threshold of relative frequencies (between 0 and 1) for selection.

fraction

Numeric between 0 and 1. The fraction of data to be used in each boosting run.

seed

A seed to be set before the stability selection.

x

A object of class stabsel.

show

Number of terms to be shown.

pal

Color palette for different model terms.

...

Not used yet in stabsel.

Details

stabsel performs stability selection based on gradient boosting (opt_boost): The boosting algorithm is run B times on a randomly drawn fraction of the data. Each boosting run is stopped either when q terms have been selected, or when maxit iterations have been performed, i.e. either q or maxit can be used to tune the regularization of the boosting. After the boosting the relative selection frequencies are evaluated. Terms with a relative selection frequency larger then thr are suggested for a final regression model.

If neither q nor maxit has been specified, q will be set to the square root of the number of columns in data.

Gradient boosting does not depend on random numbers. Thus, the individual boosting runs differ only in the subset of data which is used.

Value

A object of class stabsel.

Author(s)

Thorsten Simon

Examples

## Not run: ## Simulate some data.
set.seed(111)
d <- GAMart()
n <- nrow(d)

## Add some noise variables.
for(i in 4:9)
  d[[paste0("x",i)]] <- rnorm(n)

f <- paste0("~ ", paste("s(x", 1:9, ")", collapse = "+", sep = ""))
f <- paste(f, "+ te(lon,lat)")
f <- as.formula(f)
f <- list(update(f, num ~ .), f)

## Run stability selection.
sel <- stabsel(f, data = d, q = 6, B = 10)
plot(sel)

## Estimate selected model.
nf <- formula(sel)
b <- bamlss(nf, data = d)
plot(b)

## End(Not run)

bamlss documentation built on Oct. 11, 2024, 5:07 p.m.