stepGAIC: Choose a model by GAIC in a Stepwise Algorithm
In gamlss: Generalized Additive Models for Location Scale and Shape

View source: R/DropAddStepGAIC-Parallel.R

stepGAIC

R Documentation

Choose a model by GAIC in a Stepwise Algorithm

Description

The function stepGAIC() performs stepwise model selection using a Generalized Akaike Information Criterion (GAIC). It is based on the function stepAIC() given in the library MASS of Venables and Ripley (2002). The function has been changed recently to allow parallel computation. The parallel computations are similar to the ones performed in the function boot() of the boot package. Note that since version 4.3-5 of gamlss the stepGAIC() do not have the option of using the function stepGAIC.CH through the argument additive.

Note that stepGAIC() is relying to the dropterm() and addterm() methods applied to gamlss objects. drop1() and add1() are equivalent methods to the dropterm() and addterm() respectively but with different default arguments (see the examples).

The function stepGAIC.VR() is the old version of stepGAIC() with no parallel computations.

The function stepGAIC.CH is based on the S function step.gam() (see Chambers and Hastie (1991)) and it is more suited for model with smoothing additive terms when the degrees of freedom for smoothing are fixed in advance. This is something which rarely used these days, as most of the smoothing functions allow the calculations of the smoothing parameter, see for example the additive function pb()).

The functions stepGAIC.VR() and stepGAIC.CH() have been adapted to work with gamlss objects and the main difference is the scope argument, see below.

While the functions stepGAIC() is used to build models for individual parameters of the distribution of the response variable, the functions stepGAICAll.A() and stepGAICAll.A() are building models for all the parameters.

The functions stepGAICAll.A() and stepGAICAll.B() are based on the stepGAIC() function but use different strategies for selecting a appropriate final model.

stepGAICAll.A() has the following strategy:

Strategy A:

i) build a model for mu using a forward approach.

ii) given the model for mu build a model for sigma (forward)

iii) given the models for mu and sigma build a model for nu (forward)

iv) given the models for mu, sigma and nu build a model for tau (forward)

v) given the models for mu, sigma, nu and tau check whether the terms for nu are needed using backward elimination.

vi) given the models for mu, sigma, nu and tau check whether the terms for sigma are needed (backward).

vii) given the models for mu, sigma, nu and tau check whether the terms for mu are needed (backward).

Note for this strategy to work the scope argument should be set appropriately.

stepGAICAll.B() uses the same procedure as the function stepGAIC() but each term in the scope is fitted to all the parameters of the distribution, rather than the one specified by the argument what of stepGAIC(). The stepGAICAll.B() relies on the add1All() and drop1All() functions for the selection of variables.

Usage


stepGAIC(object, scope, direction = c("both", "backward", "forward"), 
          trace = TRUE, keep = NULL, steps = 1000, scale = 0, 
          what = c("mu", "sigma", "nu", "tau"), parameter= NULL, k = 2, 
          parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, 
          ...)

stepGAIC.VR(object, scope, direction = c("both", "backward", "forward"), 
         trace = TRUE, keep = NULL, steps = 1000, scale = 0, 
         what = c("mu", "sigma", "nu", "tau"), parameter= NULL, k = 2, 
         ...)

stepGAIC.CH(object, scope = gamlss.scope(model.frame(object)), 
            direction = c("both", "backward", "forward"), trace = TRUE, 
            keep = NULL, steps = 1000, what = c("mu", "sigma", "nu", "tau"),
            parameter= NULL, k = 2, ...)

stepGAICAll.A(object, scope = NULL, sigma.scope = NULL, nu.scope = NULL, 
              tau.scope = NULL, mu.try = TRUE, sigma.try = TRUE, 
              nu.try = TRUE, tau.try = TRUE,  direction = NULL,  
              parallel = c("no", "multicore", "snow"), ncpus = 1L, 
              cl = NULL,  ...)

stepGAICAll.B(object, scope, direction = c("both", "backward", "forward"), 
              trace = T, keep = NULL, steps = 1000, scale = 0, k = 2, 
              parallel = c("no", "multicore", "snow"), ncpus = 1L, 
              cl = NULL, ...) 
               
drop1All(object, scope, test = c("Chisq", "none"), k = 2, sorted = FALSE, 
              trace = FALSE, parallel = c("no", "multicore", "snow"), 
              ncpus = 1L, cl = NULL, ...)
              
add1All(object, scope, test = c("Chisq", "none"), k = 2, sorted = FALSE, 
              trace = FALSE, parallel = c("no", "multicore", "snow"), 
              ncpus = 1L, cl = NULL, ...)

Arguments

`object`	an gamlss object. This is used as the initial model in the stepwise search.
`scope`	defines the range of models examined in the stepwise search. For the function `stepAIC()` this should be either a single formula, or a list containing components `upper` and `lower`, both formulae. See the details for how to specify the formulae and how they are used. For the function `stepGAIC` the scope defines the range of models examined in the step-wise search. It is a list of formulas, with each formula corresponding to a term in the model. A 1 in the formula allows the additional option of leaving the term out of the model entirely. +
`direction`	the mode of stepwise search, can be one of `both`, `backward`, or `forward`, with a default of `both`. If the `scope` argument is missing the default for `direction` is `backward`

`trace`	if positive, information is printed during the running of `stepAIC`. Larger values may give more information on the fitting process.
`keep`	a filter function whose input is a fitted model object and the associated 'AIC' statistic, and whose output is arbitrary. Typically 'keep' will select a subset of the components of the object and return them. The default is not to keep anything.
`steps`	the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early.
`scale`	scale is nor used in gamlss
`what`	which distribution parameter is required, default `what="mu"`
`parameter`	equivalent to `what`
`k`	the multiple of the number of degrees of freedom used for the penalty. Only 'k = 2' gives the genuine AIC: 'k = log(n)' is sometimes referred to as BIC or SBC.
`parallel`	The type of parallel operation to be used (if any). If missing, the default is "no".
`ncpus`	integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.
`cl`	An optional parallel or snow cluster for use if `parallel = "snow"`. If not supplied, a cluster on the local machine is created for the duration of the call.
`sigma.scope`	scope for `sigma` if different to `scope` in `stepGAICAll.A()`
`nu.scope`	scope for `nu` if different to `scope` in `stepGAICAll.A()`
`tau.scope`	scope for `tau` if different to `scope` in `stepGAICAll.A()`
`mu.try`	The default value is is TRUE, set to FALSE if no model for `mu` is needed
`sigma.try`	The default value is TRUE, set to FALSE if no model for `sigma` is needed
`nu.try`	The default value is TRUE, set to FALSE if no model for `nu` is needed
`tau.try`	The default value is TRUE, set to FALSE if no model for `tau` is needed
`test`	whether to print the chi-square test or not
`sorted`	whether to sort the results
`...`	any additional arguments to 'extractAIC'. (None are currently used.)

Details

The set of models searched is determined by the scope argument.

For the function stepGAIC.VR() the right-hand-side of its lower component is always included in the model, and right-hand-side of the model is included in the upper component. If scope is a single formula, it specifies the upper component, and the lower model is empty. If scope is missing, the initial model is used as the upper model.

Models specified by scope can be templates to update object as used by update.formula.

For the function stepGAIC.CH() each of the formulas in scope specifies a "regimen" of candidate forms in which the particular term may enter the model. For example, a term formula might be

~ x1 + log(x1) + cs(x1, df=3)

This means that x1 could either appear linearly, linearly in its logarithm, or as a smooth function estimated non-parametrically. Every term in the model is described by such a term formula, and the final model is built up by selecting a component from each formula.

The function gamlss.scope similar to the S gam.scope() in Chambers and Hastie (1991) can be used to create automatically term formulae from specified data or model frames.

The supplied model object is used as the starting model, and hence there is the requirement that one term from each of the term formulas of the parameters be present in the formula of the distribution parameter. This also implies that any terms in formula of the distribution parameter not contained in any of the term formulas will be forced to be present in every model considered.

When the smoother used in gamlss modelling belongs to the new generation of smoothers allowing the determination of the smoothing parameters automatically (i.e. pb(), cy()) then the function stepGAIC.VR() can be used for model selection (see example below).

Value

the stepwise-selected model is returned, with up to two additional components. There is an '"anova"' component corresponding to the steps taken in the search, as well as a '"keep"' component if the 'keep=' argument was supplied in the call. The '"Resid. Dev"' column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood

The function stepGAICAll.A() returns with a component "anovaAll" containing all the different anova tables used in the process.

Author(s)

Mikis Stasinopoulos based on functions in MASS library and in Statistical Models in S

References

Chambers, J. M. and Hastie, T. J. (1991). Statistical Models in S, Chapman and Hall, London.

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Examples

## Not run: 
data(usair)
# -----------------------------------------------------------------------------
#  null model 
mod0<-gamlss(y~1, data=usair, family=GA)
#  all the explanatotory variables x1:x6 fitted linearly 
mod1<-gamlss(y~., data=usair, family=GA)
#-------------------------------------------------------------------------------
# droping terms 
dropterm(mod1)
# with chi-square information
drop1(mod1)
# for parallel computations use something like 
nC <- detectCores()
drop1(mod1, parallel="snow",  ncpus=nC)
drop1(mod1, parallel="multicore",  ncpus=nC)
#------------------------------------------------------------------------------
# adding terms
addterm(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
                  collapse="+"),sep="")))
# with chi-square information
add1(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
                  collapse="+"),sep="")))
# for parallel computations 
nC <- detectCores()
add1(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
                  collapse="+"),sep="")), parallel="snow",  ncpus=nC)
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# stepGAIC 
# find the best subset for the mu
mod2 <- stepGAIC(mod1)
mod2$anova
#--------------------------------------------------------------
# for parallel computations 
mod21 <- stepGAIC(mod1, , parallel="snow",  ncpus=nC)
#--------------------------------------------------------------
# find the best subset for sigma
mod3<-stepGAIC(mod2, what="sigma", scope=~x1+x2+x3+x4+x5+x6)
mod3$anova
#--------------------------------------------------------------
# find the best model using pb() smoother 
#only three variables are used here for simplicity
mod20<-stepGAIC(mod0, scope=list(lower=~1, upper=~pb(x1)+pb(x2)+pb(x5)))
edf(mod20)
# note that x1 and x2 enter linearly
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# the stepGAIC.CH function (no parallel here)
# creating a scope from the usair model frame 
gs<-gamlss.scope(model.frame(y~x1+x2+x3+x4+x5+x6, data=usair))
gs 
mod5<-stepGAIC.CH(mod0,gs)
mod5$anova
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------
# now stepGAICAll.A    
mod7<-stepGAICAll.A(mod0, scope=list(lower=~1,upper=~x1+x2+x3+x4+x5+x6)) 
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------
# now  stepGAICAll.B
drop1All(mod1, parallel="snow",  ncpus=nC)
add1All(mod0, scope=as.formula(paste("~", paste(names(usair[-1]),
                  collapse="+"))), parallel="snow",  ncpus=nC)
mod8<-stepGAICAll.B(mod0, scope=list(lower=~1,upper=~x1+x2+x3+x4+x5+x6))
#-----------------------------------------------------------------------------=-
#------------------------------------------------------------------------------

## End(Not run)

gamlss documentation built on May 29, 2024, 6:08 a.m.