stepwise: Stepwise selection of regressors
In config-i1/greybox: Toolbox for Model Building and Forecasting

stepwise

R Documentation

Stepwise selection of regressors

Description

Function selects variables that give linear regression with the lowest information criteria. The selection is done stepwise (forward) based on partial correlations. This should be a simpler and faster implementation than step() function from ‘stats’ package.

Usage

stepwise(data, ic = c("AICc", "AIC", "BIC", "BICc"), silent = TRUE,
  df = NULL, formula = NULL, subset = NULL, method = c("pearson",
  "kendall", "spearman"), distribution = c("dnorm", "dlaplace", "ds",
  "dgnorm", "dlogis", "dt", "dalaplace", "dlnorm", "dllaplace", "dls",
  "dlgnorm", "dbcnorm", "dinvgauss", "dgamma", "dexp", "dfnorm", "drectnorm",
  "dpois", "dnbinom", "dbeta", "dlogitnorm", "plogis", "pnorm"),
  occurrence = c("none", "plogis", "pnorm"), ...)

Arguments

`data`	Data frame containing dependant variable in the first column and the others in the rest.
`ic`	Information criterion to use.
`silent`	If `silent=FALSE`, then nothing is silent, everything is printed out. `silent=TRUE` means that nothing is produced.
`df`	Number of degrees of freedom to add (should be used if stepwise is used on residuals).
`formula`	If provided, then the selection will be done from the listed variables in the formula after all the necessary transformations.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`method`	Method of correlations calculation. The default is Pearson's correlation, which should be applicable to a wide range of data in different scales.
`distribution`	Distribution to pass to `alm()`. See alm for details.
`occurrence`	what distribution to use for occurrence part. See alm for details.
`...`	This is temporary and is needed in order to capture "silent" parameter if it is provided.

Details

The algorithm uses alm() to fit different models and cor() to select the next regressor in the sequence.

Some details and examples of application are also given in the vignette "Greybox": vignette("greybox","greybox")

Value

Function returns model - the final model of the class "alm". See alm for details of the output.

Author(s)

Ivan Svetunkov, ivan@svetunkov.com

References

Burnham Kenneth P. and Anderson David R. (2002). Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach. Springer-Verlag New York. DOI: [10.1007/b97636](http://dx.doi.org/10.1007/b97636).
McQuarrie, A. D. (1999). A small-sample correction for the Schwarz SIC model selection criterion. Statistics & Probability Letters, 44(1), 79–86. [10.1016/S0167-7152(98)00294-6](https://doi.org/10.1016/S0167-7152(98)00294-6).

Examples


### Simple example
xreg <- cbind(rnorm(100,10,3),rnorm(100,50,5))
xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rnorm(100,0,3),xreg,rnorm(100,300,10))
colnames(xreg) <- c("y","x1","x2","Noise")
stepwise(xreg)

### Mixture distribution of Log Normal and Cumulative Logit
xreg[,1] <- xreg[,1] * round(exp(xreg[,1]-70) / (1 + exp(xreg[,1]-70)),0)
colnames(xreg) <- c("y","x1","x2","Noise")
ourModel <- stepwise(xreg, distribution="dlnorm",
                     occurrence=stepwise(xreg, distribution="plogis"))
summary(ourModel)

### Fat regression example
xreg <- matrix(rnorm(20000,10,3),100,200)
xreg <- cbind(100+0.5*xreg[,1]-0.75*xreg[,2]+rnorm(100,0,3),xreg,rnorm(100,300,10))
colnames(xreg) <- c("y",paste0("x",c(1:200)),"Noise")
ourModel <- stepwise(xreg,ic="AICc")
plot(ourModel$ICs,type="l",ylim=range(min(ourModel$ICs),max(ourModel$ICs)+5))
points(ourModel$ICs)
text(c(1:length(ourModel$ICs))+0.1,ourModel$ICs+5,names(ourModel$ICs))

config-i1/greybox documentation built on June 15, 2025, 5:10 a.m.