fitDist: Fitting Different Parametric 'gamlss.family' Distributions.
In gamlss: Generalized Additive Models for Location Scale and Shape

fitDist

R Documentation

Fitting Different Parametric `gamlss.family` Distributions.

Description

The function fitDist() is using the function gamlssML() to fit all relevant parametric gamlss.family distributions, specified by the argument type), to a single data vector (with no explanatory variables). The final marginal distribution is the one selected by the generalised Akaike information criterion with penalty k. The default is k=2 i.e AIC.

The function fitDistPred() is using the function gamlssMLpred() to fit all relevant (marginal) parametric gamlss.family distributions to a single data vector (similar to fitDist()) but the final model is selected by the minimum prediction global deviance. The user has to specify the training and validation/test samples.

The function chooseDist() is using the function update.gamlss() to fit all relevant parametric (conditional) gamlss.family distributions to a given fitted gamlss model. The output of the function is a matrix with rows the different distributions (from the argument type) and columns the different GAIC's (). The default argument for k are 2, for AIC, 3.84, for Chi square, and log(n) for BIC. No final model is given by the function like for example in fitDist(). The function getOrder() can be used to rank the columns of the resulting table (matrix). The final model can be refitted using update(), see the examples.

Usage

fitDist(y, k = 2, 
    type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"), 
    try.gamlss = FALSE, extra = NULL, data = NULL,trace = FALSE, ...)

fitDistPred(y, 
    type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"), 
    try.gamlss = FALSE, extra = NULL, data = NULL, rand = NULL,
    newdata = NULL, trace = FALSE, ...)    
      
chooseDist(object, k = c(2, 3.84, round(log(length(object$y)), 2)), type = 
    c("realAll", "realline", "realplus", "real0to1", "counts", "binom","extra"), 
    extra = NULL, trace = FALSE, 
    parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, ...)

chooseDistPred(object, type = c("realAll", "realline", "realplus", 
     "real0to1", "counts", "binom", "extra"), extra = NULL, 
     trace = FALSE, parallel = c("no", "multicore", "snow"), 
     ncpus = 1L, cl = NULL, newdata = NULL, rand = NULL, ...)
      
getOrder(obj, column = 1)

Arguments

`y`	the data vector
`object`, `obj`	a GAMLSS fitted model
`k`	the penalty for the GAIC with default values `k=2` the standard AIC. In the case of the function `chooseDist()` `k` can be a vector i.e. `k= c(2, 4, 6)` so more than one GAIC are saved.
`type`	the type of distribution to be tried see details
`try.gamlss`	this applies to functions `fitDist()` and `fitDistPred()`. It allows if `gamlssML()` fail to fit the model to try `gamlss` instead. This will slow up things for big data.
`extra`	whether extra distributions should be tried, which are not in the `type` list. Note that the function `chooseDist()` allows the fitting of only the ‘extra’ distributions. This can be achieved if `extra` is set i.e. `extra=c("GA", "IG", "GG")` and type is set to extra i.e. `type="extra"`.
`data`	the data frame where `y` can be found, only for functions `fitDist()` and `fitDistPred()`
`rand`	For `fitDistPred()` a factor with values 1 (for fitting) and 2 (for predicting).
`newdata`	The prediction data set (validation or test).
`trace`	whether to print during fitting. Note that when `parallel` is 'multocore' or "snow" `"trace"` is not produce any output.
`parallel`	The type of parallel operation to be used (if any). If missing, the default is "no".
`ncpus`	integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.
`cl`	This is useful for snow clusters, i.e. `parallel = "snow"`, when the clusters are created in advance. If not supplied, a cluster on the local machine is created for the duration of the call.
`column`	which column of the output matrix to be ordered according to best GAIC
`...`	for extra arguments to be passed to gamlssML() to gamlss()

Details

The following are the different type argument:

realAll: All the gamlss.family continuous distributions defined on the real line, i.e. realline and the real positive line i.e. realplus.
realline: The gamlss.family continuous distributions : "NO", "GU", "RG" ,"LO", "NET", "TF", "TF2", "PE","PE2", "SN1", "SN2", "exGAUS", "SHASH", "SHASHo","SHASHo2", "EGB2", "JSU", "JSUo", "SEP1", "SEP2", "SEP3", "SEP4", "ST1", "ST2", "ST3", "ST4", "ST5", "SST", "GT".
realplus: The gamlss.family continuous distributions in the positive real line: "EXP", "GA","IG","LOGNO", "LOGNO2","WEI", "WEI2", "WEI3", "IGAMMA","PARETO2", "PARETO2o", "GP", "BCCG", "BCCGo", "exGAUS", "GG", "GIG", "LNO","BCTo", "BCT", "BCPEo", "BCPE", "GB2".
real0to1: The gamlss.family continuous distributions from 0 to 1: "BE", "BEo", "BEINF0", "BEINF1", "BEOI", "BEZI", "BEINF", "GB1".
counts: The gamlss.family distributions for counts: "PO", "GEOM", "GEOMo","LG", "YULE", "ZIPF", "WARING", "GPO", "DPO", "BNB", "NBF","NBI", "NBII", "PIG", "ZIP","ZIP2", "ZAP", "ZALG", "DEL", "ZAZIPF", "SI", "SICHEL","ZANBI", "ZAPIG", "ZINBI", "ZIPIG", "ZINBF", "ZABNB", "ZASICHEL", "ZINBF", "ZIBNB", "ZISICHEL".
binom: The gamlss.family distributions for binomial type data :"BI", "BB", "DB", "ZIBI", "ZIBB", "ZABI", "ZABB".

The function fitDist() uses the function gamlssML() to fit the different models, the function fitDistPred() uses gamlssMLpred() and the function chooseDist() used update.gamlss().

Value

For the functions fitDist() and fitDistPred() a gamlssML object is return (the one which minimised the GAIC or VDEV respectively) with two extra components:

`fits`	an ordered list according to the GAIC of the fitted distribution
`failed`	the distributions where the `gamlssML)()` (or `gamlss()`) fits have failed

For the function chooseDist() a matrix is returned, with rows the different distributions and columns the different GAIC's set by k.

Author(s)

Mikis Stasinopoulos, Bob Rigby, Vlasis Voudouris and Majid Djennad.

References

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

Examples

y <- rt(100, df=1)
m1<-fitDist(y, type="realline")
m1$fits
m1$failed
# an example of using  extra
## Not run: 
#---------------------------------------  
# Example of using the argument extra  
library(gamlss.tr)
data(tensile)
gen.trun(par=1,family="GA", type="right")
gen.trun(par=1,"LOGNO", type="right")
gen.trun(par=c(0,1),"TF", type="both")
ma<-fitDist(str, type="real0to1", trace=T,
       extra=c("GAtr", "LOGNOtr", "TFtr"), 
     data=tensile) 
ma$fits
ma$failed
#-------------------------------------
# selecting model using the prediction global deviance
# Using fitDistPred
# creating training data
y <- rt(1000, df=2)
m1 <- fitDist(y, type="realline")
m1$fits
m1$fails
# create validation data
yn <- rt(1000, df=2)
# choose distribution which fits the new data best
p1 <- fitDistPred(y, type="realline", newdata=yn)
p1$fits
p1$failed
#---------------------------------------
# using the function chooseDist()
# fitting normal distribution model
m1 <- gamlss(y~pb(x), sigma.fo=~pb(x), family=NO, data=abdom)
# choose a distribution on the real line 
# and save GAIC(k=c(2,4,6.4),  i.e. AIC, Chi-square and BIC.   
t1 <- chooseDist(m1, type="realline", parallel="snow", ncpus=4)
# the GAIC's
t1
# the distributions which failed are with NA's 
# ordering according to  BIC
getOrder(t1,3)
fm<-update(m1, family=names(getOrder(t1,3)[1]))

## End(Not run)

gamlss documentation built on Aug. 21, 2025, 5:56 p.m.