stepwise: Stepwise regression

View source: R/stepwise.R

stepwiseR Documentation

Stepwise regression

Description

This function runs a stepwise regression, selecting and/or excluding variables based on the significance (p-value) of the statistical tests implemented in the add1 and drop1 functions of R.

Usage

stepwise(data, sp.col, var.cols, id.col = NULL, family = binomial(link="logit"), 
direction = "both", test.in = "Rao", test.out = "LRT", p.in = 0.05, p.out = 0.1, 
trace = 1, simplif = TRUE, preds = FALSE, Favourability = FALSE, Wald = FALSE)

Arguments

data

a data frame (or an object that can be coerced with 'as.data.frame') containing your target and predictor variables.

sp.col

name or index number of the column of 'data' that contains the response variable.

var.cols

names or index numbers of the columns of 'data' that contain the predictor variables.

id.col

(optional) name or index number of column containing the row identifiers (if defined, it will be included in the output 'predictions' data frame).

family

argument to be passed to glm indicating the error distribution (and optionally the link function) to be used in the model. The default is binomial distribution with logit link (i.e. logistic regression, for binary response variables), and it is the only one that has been tested so far. If you try other options, please carefully check your results and let me know if you find a bug.

direction

the mode of stepwise search. Can be either "forward", "backward", or "both" (the default).

test.in

argument to pass to add1 specifying the statistical test whose 'p.in' a variable must pass to enter the model. Can be "Rao" (the default), "LRT", "Chisq" or "F".

test.out

argument to pass to drop1 specifying the statistical test whose 'p.out' a variable must exceed to be expelled from the model (if it does not simultaneously pass the 'test.in' when direction="both"). Can be "LRT" (the default), "Rao", "Chisq" or "F".

p.in

threshold p-value for a variable to enter the model. Defaults to 0.05.

p.out

threshold p-value for a variable to leave the model. Defaults to 0.1.

trace

if positive, information is printed to the console at each step. The default is 1, for naming each variable that was added or removed. With trace=2, the summary of the model at each step is also printed.

simplif

logical, whether to return a simpler output containing only the model object (the default), or a list with, additionally, a data frame with the variable included or excluded at each step.

preds

logical, whether to return also the predictions given by the model at each step. This argument is ignored if simplif=TRUE.

Favourability

logical, whether to convert the predictions (if preds=TRUE) with the Fav function. This argument is ignored if simplif=TRUE.

Wald

logical, whether to print the Wald test statistics using summaryWald, rather than the z test normally returned by summary. Used only if trace > 1. Requires the aod package. The default is FALSE.

Details

Stepwise variable selection is a way of selecting a subset of significant variables to get a simple and easily interpretable model. It is more computationally efficient than best subset selection. This function uses the R functions add1 for selecting and drop1 for excluding variables. The default parameters mimic the "Forward Selection (Conditional)" stepwise procedure implemented in the IBM SPSS software. This is a widely used (e.g. Munoz et al. 2005, Olivero et al. 2017, 2020, Garcia-Carrasco et al. 2021) but also widely criticized method for variable selection (e.g. Harrell 2001; Whittingham et al. 2006; Flom & Cassell, 2007; Smith 2018), though its AIC-based counterpart (implemented in the step R function) is also not without flaws (e.g. Murtaugh 2014; Coelho et al. 2019).

Value

If simplif=TRUE (the default), this function returns the model object obtained after the variable selection procedure. If simplif=FALSE, it returns a list with the following components:

model

the model object obtained after the variable selection procedure.

steps

a data frame where each row shows the variable included or excluded at each step.

predictions

(if preds=TRUE) a data frame where each column contains the predictions of the model obtained at each step. These predictions are probabilities by default, or favourabilities if Favourability=TRUE.

Author(s)

A. Marcia Barbosa

References

Coelho M.T.P., Diniz-Filho J.A. & Rangel T.F. (2019) A parsimonious view of the parsimony principle in ecology and evolution. Ecography, 42:968-976

Flom P.L. & Cassell D.L. (2007) Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use. NESUG 2007

Garcia-Carrasco J.M., Munoz A.R., Olivero J., Segura M. & Real R. (2021) Predicting the spatio-temporal spread of West Nile virus in Europe. PLoS Neglected Tropical Diseases 15(1):e0009022

Harrell F.E. (2001) Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York

Munoz, A.R., Real R., Barbosa A.M. & Vargas J.M. (2005) Modelling the distribution of Bonelli's Eagle in Spain: Implications for conservation planning. Diversity and Distributions 11: 477-486

Murtaugh P.A. (2014) In defense of P values. Ecology, 95:611-617

Olivero J., Fa J.E., Real R., Marquez A.L., Farfan M.A., Vargas J.M, Gaveau D., Salim M.A., Park D., Suter J., King S., Leendertz S.A., Sheil D. & Nasi R. (2017) Recent loss of closed forests is associated with Ebola virus disease outbreaks. Scientific Reports 7: 14291

Olivero J., Fa J.E., Farfan M.A., Marquez A.L., Real R., Juste F.J., Leendertz S.A. & Nasi R. (2020) Human activities link fruit bat presence to Ebola virus disease outbreaks. Mammal Review 50:1-10

Smith G. (2018) Step away from stepwise. Journal of Big Data 32 (https://doi.org/10.1186/s40537-018-0143-6)

Whittingham M.J., Stephens P.A., Bradbury R.B. & Freckleton R.P. (2006) Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology, 75:1182-1189

See Also

step, stepByStep, modelTrim

Examples

data(rotif.env)

stepwise(data = rotif.env, sp.col = 18, var.cols = 5:17)

fuzzySim documentation built on Oct. 31, 2022, 1:07 a.m.