fitDist | R Documentation |
gamlss.family
Distributions.
The function fitDist()
is using the function gamlssML()
to fit all relevant parametric gamlss.family
distributions, specified by the argument type
), to a single data vector (with no explanatory variables). The final marginal distribution is the one selected by the generalised Akaike information criterion with penalty k
. The default is k=2
i.e AIC.
The function fitDistPred()
is using the function gamlssMLpred()
to fit all relevant (marginal) parametric gamlss.family
distributions to a single data vector (similar to fitDist()
) but the final model is selected by the minimum prediction global deviance. The user has to specify the training and validation/test samples.
The function chooseDist()
is using the function update.gamlss()
to fit all relevant parametric (conditional) gamlss.family
distributions to a given fitted gamlss
model. The output of the function is a matrix with rows the different distributions (from the argument type
) and columns the different GAIC's (). The default argument for k
are 2, for AIC, 3.84, for Chi square, and log(n) for BIC. No final model is given by the function like for example in fitDist()
. The function getOrder()
can be used to rank the columns of the resulting table (matrix).
The final model can be refitted using update()
, see the examples.
fitDist(y, k = 2,
type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"),
try.gamlss = FALSE, extra = NULL, data = NULL,trace = FALSE, ...)
fitDistPred(y,
type = c("realAll", "realline", "realplus", "real0to1", "counts", "binom"),
try.gamlss = FALSE, extra = NULL, data = NULL, rand = NULL,
newdata = NULL, trace = FALSE, ...)
chooseDist(object, k = c(2, 3.84, round(log(length(object$y)), 2)), type =
c("realAll", "realline", "realplus", "real0to1", "counts", "binom","extra"),
extra = NULL, trace = FALSE,
parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL, ...)
chooseDistPred(object, type = c("realAll", "realline", "realplus",
"real0to1", "counts", "binom", "extra"), extra = NULL,
trace = FALSE, parallel = c("no", "multicore", "snow"),
ncpus = 1L, cl = NULL, newdata = NULL, rand = NULL, ...)
getOrder(obj, column = 1)
y |
the data vector |
object , obj |
a GAMLSS fitted model |
k |
the penalty for the GAIC with default values |
type |
the type of distribution to be tried see details |
try.gamlss |
this applies to functions |
extra |
whether extra distributions should be tried, which are not in the |
data |
the data frame where |
rand |
For |
newdata |
The prediction data set (validation or test). |
trace |
whether to print during fitting. Note that when |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no". |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. |
cl |
This is useful for snow clusters, i.e. |
column |
which column of the output matrix to be ordered according to best GAIC |
... |
for extra arguments to be passed to gamlssML() to gamlss() |
The following are the different type
argument:
realAll: All the gamlss.family
continuous distributions defined on the real line, i.e. realline
and the real positive line i.e. realplus
.
realline: The gamlss.family
continuous distributions : "NO", "GU", "RG" ,"LO", "NET", "TF", "TF2", "PE","PE2", "SN1", "SN2", "exGAUS", "SHASH", "SHASHo","SHASHo2", "EGB2", "JSU", "JSUo", "SEP1", "SEP2", "SEP3", "SEP4", "ST1", "ST2", "ST3", "ST4", "ST5", "SST", "GT".
realplus: The gamlss.family
continuous distributions in the positive real line: "EXP", "GA","IG","LOGNO", "LOGNO2","WEI", "WEI2", "WEI3", "IGAMMA","PARETO2", "PARETO2o", "GP", "BCCG", "BCCGo", "exGAUS", "GG", "GIG", "LNO","BCTo", "BCT", "BCPEo", "BCPE", "GB2".
real0to1: The gamlss.family
continuous distributions from 0 to 1: "BE", "BEo", "BEINF0", "BEINF1", "BEOI", "BEZI", "BEINF", "GB1".
counts: The gamlss.family
distributions for counts: "PO", "GEOM", "GEOMo","LG", "YULE", "ZIPF", "WARING", "GPO", "DPO", "BNB", "NBF","NBI", "NBII", "PIG", "ZIP","ZIP2", "ZAP", "ZALG", "DEL", "ZAZIPF", "SI", "SICHEL","ZANBI", "ZAPIG", "ZINBI", "ZIPIG", "ZINBF", "ZABNB", "ZASICHEL", "ZINBF", "ZIBNB",
"ZISICHEL".
binom: The gamlss.family
distributions for binomial type data :"BI", "BB", "DB", "ZIBI", "ZIBB", "ZABI", "ZABB".
The function fitDist()
uses the function gamlssML()
to fit the different models, the function fitDistPred()
uses gamlssMLpred()
and the function chooseDist()
used update.gamlss()
.
For the functions fitDist()
and fitDistPred()
a gamlssML
object is return (the one which minimised the GAIC or VDEV respectively) with two extra components:
fits |
an ordered list according to the GAIC of the fitted distribution |
failed |
the distributions where the |
For the function chooseDist()
a matrix is returned, with rows the different distributions and columns the different GAIC's set by k
.
Mikis Stasinopoulos, Bob Rigby, Vlasis Voudouris and Majid Djennad.
Rigby, R. A. and Stasinopoulos D. M. (2005). Generalized additive models for location, scale and shape,(with discussion), Appl. Statist., 54, part 3, pp 507-554.
Rigby, R. A., Stasinopoulos, D. M., Heller, G. Z., and De Bastiani, F. (2019) Distributions for modeling location, scale, and shape: Using GAMLSS in R, Chapman and Hall/CRC. An older version can be found in https://www.gamlss.com/.
Stasinopoulos D. M. Rigby R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software, Vol. 23, Issue 7, Dec 2007, https://www.jstatsoft.org/v23/i07/.
Stasinopoulos D. M., Rigby R.A., Heller G., Voudouris V., and De Bastiani F., (2017) Flexible Regression and Smoothing: Using GAMLSS in R, Chapman and Hall/CRC.
(see also https://www.gamlss.com/).
gamlss
, gamlssML
y <- rt(100, df=1)
m1<-fitDist(y, type="realline")
m1$fits
m1$failed
# an example of using extra
## Not run:
#---------------------------------------
# Example of using the argument extra
library(gamlss.tr)
data(tensile)
gen.trun(par=1,family="GA", type="right")
gen.trun(par=1,"LOGNO", type="right")
gen.trun(par=c(0,1),"TF", type="both")
ma<-fitDist(str, type="real0to1", trace=T,
extra=c("GAtr", "LOGNOtr", "TFtr"),
data=tensile)
ma$fits
ma$failed
#-------------------------------------
# selecting model using the prediction global deviance
# Using fitDistPred
# creating training data
y <- rt(1000, df=2)
m1 <- fitDist(y, type="realline")
m1$fits
m1$fails
# create validation data
yn <- rt(1000, df=2)
# choose distribution which fits the new data best
p1 <- fitDistPred(y, type="realline", newdata=yn)
p1$fits
p1$failed
#---------------------------------------
# using the function chooseDist()
# fitting normal distribution model
m1 <- gamlss(y~pb(x), sigma.fo=~pb(x), family=NO, data=abdom)
# choose a distribution on the real line
# and save GAIC(k=c(2,4,6.4), i.e. AIC, Chi-square and BIC.
t1 <- chooseDist(m1, type="realline", parallel="snow", ncpus=4)
# the GAIC's
t1
# the distributions which failed are with NA's
# ordering according to BIC
getOrder(t1,3)
fm<-update(m1, family=names(getOrder(t1,3)[1]))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.