fitDist: Fitting Different Parametric 'gamlss.family' Distributions.

View source: R/fitDist.R

fitDistR Documentation

Fitting Different Parametric gamlss.family Distributions.

Description

The function fitDist() is using the function gamlssML() to fit all relevant parametric gamlss.family distributions, specified by the argument type), to a single data vector (with no explanatory variables). The final marginal distribution is the one selected by the generalised Akaike information criterion with penalty k. The default is k=2 i.e AIC.

The function fitDistPred() is using the function gamlssMLpred() to fit all relevant (marginal) parametric gamlss.family distributions to a single data vector (similar to fitDist()) but the final model is selected by the minimum prediction global deviance. The user has to specify the training and validation/test samples.

The function chooseDist() is using the function update.gamlss() to fit all relevant parametric (conditional) gamlss.family distributions to a given fitted gamlss model. The output of the function is a matrix with rows the different distributions (from the argument type) and columns the different GAIC's (). The default argument for k are 2, for AIC, 3.84, for Chi square, and log(n) for BIC. No final model is given by the function like for example in fitDist(). The function getOrder() can be used to rank the columns of the resulting table (matrix). The final model can be refitted using update(), see the examples.

Usage

fitDist(y, k = 2, 
    type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"), 
    try.gamlss = FALSE, extra = NULL, data = NULL,trace = FALSE, ...)

fitDistPred(y, 
    type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"), 
    try.gamlss = FALSE, extra = NULL, data = NULL, rand = NULL,
    newdata = NULL, trace = FALSE, ...)    
      
chooseDist(object, k = c(2, 3.84, round(log(length(object$y)), 2)), type = 
    c("realAll", "realline", "realplus", "real0to1", "counts", "binom","extra"), 
    extra = NULL, trace = FALSE, 
    parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, ...)

chooseDistPred(object, type = c("realAll", "realline", "realplus", 
     "real0to1", "counts", "binom", "extra"), extra = NULL, 
     trace = FALSE, parallel = c("no", "multicore", "snow"), 
     ncpus = 1L, cl = NULL, newdata = NULL, rand = NULL, ...)
      
getOrder(obj, column = 1)      

Arguments

y

the data vector

object, obj

a GAMLSS fitted model

k

the penalty for the GAIC with default values k=2 the standard AIC. In the case of the function chooseDist() k can be a vector i.e. k= c(2, 4, 6) so more than one GAIC are saved.

type

the type of distribution to be tried see details

try.gamlss

this applies to functions fitDist() and fitDistPred(). It allows if gamlssML() fail to fit the model to try gamlss instead. This will slow up things for big data.

extra

whether extra distributions should be tried, which are not in the type list. Note that the function chooseDist() allows the fitting of only the ‘extra’ distributions. This can be achieved if extra is set i.e. extra=c("GA", "IG", "GG") and type is set to extra i.e. type="extra".

data

the data frame where y can be found, only for functions fitDist() and fitDistPred()

rand

For fitDistPred() a factor with values 1 (for fitting) and 2 (for predicting).

newdata

The prediction data set (validation or test).

trace

whether to print during fitting. Note that when parallel is 'multocore' or "snow" "trace" is not produce any output.

parallel

The type of parallel operation to be used (if any). If missing, the default is "no".

ncpus

integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.

cl

This is useful for snow clusters, i.e. parallel = "snow", when the clusters are created in advance. If not supplied, a cluster on the local machine is created for the duration of the call.

column

which column of the output matrix to be ordered according to best GAIC

...

for extra arguments to be passed to gamlssML() to gamlss()

Details

The following are the different type argument:

  • realAll: All the gamlss.family continuous distributions defined on the real line, i.e. realline and the real positive line i.e. realplus.

  • realline: The gamlss.family continuous distributions : "NO", "GU", "RG" ,"LO", "NET", "TF", "TF2", "PE","PE2", "SN1", "SN2", "exGAUS", "SHASH", "SHASHo","SHASHo2", "EGB2", "JSU", "JSUo", "SEP1", "SEP2", "SEP3", "SEP4", "ST1", "ST2", "ST3", "ST4", "ST5", "SST", "GT".

  • realplus: The gamlss.family continuous distributions in the positive real line: "EXP", "GA","IG","LOGNO", "LOGNO2","WEI", "WEI2", "WEI3", "IGAMMA","PARETO2", "PARETO2o", "GP", "BCCG", "BCCGo", "exGAUS", "GG", "GIG", "LNO","BCTo", "BCT", "BCPEo", "BCPE", "GB2".

  • real0to1: The gamlss.family continuous distributions from 0 to 1: "BE", "BEo", "BEINF0", "BEINF1", "BEOI", "BEZI", "BEINF", "GB1".

  • counts: The gamlss.family distributions for counts: "PO", "GEOM", "GEOMo","LG", "YULE", "ZIPF", "WARING", "GPO", "DPO", "BNB", "NBF","NBI", "NBII", "PIG", "ZIP","ZIP2", "ZAP", "ZALG", "DEL", "ZAZIPF", "SI", "SICHEL","ZANBI", "ZAPIG", "ZINBI", "ZIPIG", "ZINBF", "ZABNB", "ZASICHEL", "ZINBF", "ZIBNB", "ZISICHEL".

  • binom: The gamlss.family distributions for binomial type data :"BI", "BB", "DB", "ZIBI", "ZIBB", "ZABI", "ZABB".

    The function fitDist() uses the function gamlssML() to fit the different models, the function fitDistPred() uses gamlssMLpred() and the function chooseDist() used update.gamlss().

Value

For the functions fitDist() and fitDistPred() a gamlssML object is return (the one which minimised the GAIC or VDEV respectively) with two extra components:

fits

an ordered list according to the GAIC of the fitted distribution

failed

the distributions where the gamlssML)() (or gamlss()) fits have failed

For the function chooseDist() a matrix is returned, with rows the different distributions and columns the different GAIC's set by k.

Author(s)

Mikis Stasinopoulos, Bob Rigby, Vlasis Voudouris and Majid Djennad.

References

Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.

Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.

Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.

Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.

(see also https://www.gamlss.com/).

See Also

gamlss, gamlssML

Examples

y <- rt(100, df=1)
m1<-fitDist(y, type="realline")
m1$fits
m1$failed
# an example of using  extra
## Not run: 
#---------------------------------------  
# Example of using the argument extra  
library(gamlss.tr)
data(tensile)
gen.trun(par=1,family="GA", type="right")
gen.trun(par=1,"LOGNO", type="right")
gen.trun(par=c(0,1),"TF", type="both")
ma<-fitDist(str, type="real0to1", trace=T,
       extra=c("GAtr", "LOGNOtr", "TFtr"), 
     data=tensile) 
ma$fits
ma$failed
#-------------------------------------
# selecting model using the prediction global deviance
# Using fitDistPred
# creating training data
y <- rt(1000, df=2)
m1 <- fitDist(y, type="realline")
m1$fits
m1$fails
# create validation data
yn <- rt(1000, df=2)
# choose distribution which fits the new data best
p1 <- fitDistPred(y, type="realline", newdata=yn)
p1$fits
p1$failed
#---------------------------------------
# using the function chooseDist()
# fitting normal distribution model
m1 <- gamlss(y~pb(x), sigma.fo=~pb(x), family=NO, data=abdom)
# choose a distribution on the real line 
# and save GAIC(k=c(2,4,6.4),  i.e. AIC, Chi-square and BIC.   
t1 <- chooseDist(m1, type="realline", parallel="snow", ncpus=4)
# the GAIC's
t1
# the distributions which failed are with NA's 
# ordering according to  BIC
getOrder(t1,3)
fm<-update(m1, family=names(getOrder(t1,3)[1]))

## End(Not run)

gamlss documentation built on May 29, 2024, 6:08 a.m.