fitdist  R Documentation 
Fit of univariate distributions to noncensored data by maximum likelihood (mle),
moment matching (mme), quantile matching (qme) or
maximizing goodnessoffit estimation (mge).
The latter is also known as minimizing distance estimation.
Generic methods are print
, plot
,
summary
, quantile
, logLik
, vcov
and coef
.
fitdist(data, distr, method = c("mle", "mme", "qme", "mge", "mse"),
start=NULL, fix.arg=NULL, discrete, keepdata = TRUE, keepdata.nb=100, ...)
## S3 method for class 'fitdist'
print(x, ...)
## S3 method for class 'fitdist'
plot(x, breaks="default", ...)
## S3 method for class 'fitdist'
summary(object, ...)
## S3 method for class 'fitdist'
logLik(object, ...)
## S3 method for class 'fitdist'
vcov(object, ...)
## S3 method for class 'fitdist'
coef(object, ...)
data 
A numeric vector. 
distr 
A character string 
method 
A character string coding for the fitting method:

start 
A named list giving the initial values of parameters of the named distribution
or a function of data computing initial values and returning a named list.
This argument may be omitted (default) for some distributions for which reasonable
starting values are computed (see the 'details' section of 
fix.arg 
An optional named list giving the values of fixed parameters of the named distribution
or a function of data computing (fixed) parameter values and returning a named list.
Parameters with fixed value are thus NOT estimated by this maximum likelihood procedure.
The use of this argument is not possible if 
keepdata 
a logical. If 
keepdata.nb 
When 
discrete 
If TRUE, the distribution is considered as discrete.
If 
x 
An object of class 
object 
An object of class 
breaks 
If 
... 
Further arguments to be passed to generic functions, or to one of the functions

It is assumed that the distr
argument specifies the distribution by the
probability density function, the cumulative distribution function and
the quantile function (d, p, q).
The four possible fitting methods are described below:
method="mle"
Maximum likelihood estimation consists in maximizing the loglikelihood.
A numerical optimization is carried out in mledist
via optim
to find the best values (see mledist
for details).
method="mme"
Moment matching estimation consists in equalizing theoretical and empirical moments.
Estimated values of the distribution parameters are computed by a closedform
formula for the following distributions : "norm"
, "lnorm"
,
"pois"
, "exp"
, "gamma"
,
"nbinom"
, "geom"
, "beta"
, "unif"
and "logis"
.
Otherwise the theoretical and the empirical moments are matched numerically,
by minimization of the sum of squared differences between observed and theoretical moments.
In this last case, further arguments are needed in the call to fitdist
: order
and memp
(see mmedist
for details).
method = "qme"
Quantile matching estimation consists in equalizing theoretical and empirical quantile.
A numerical optimization is carried out in qmedist
via optim
to minimize of the sum of squared differences between observed and theoretical quantiles.
The use of this method requires an additional argument probs
,
defined as the numeric vector of the probabilities for which the quantile(s) is(are) to be matched
(see qmedist
for details).
method = "mge"
Maximum goodnessoffit estimation consists in maximizing a goodnessoffit statistics.
A numerical optimization is carried out in mgedist
via optim
to minimize the goodnessoffit distance. The use of this method
requires an additional argument gof
coding for the goodnessoffit distance chosen.
One can use the classical Cramervon Mises distance ("CvM"
), the classical
KolmogorovSmirnov distance ("KS"
), the classical AndersonDarling distance ("AD"
)
which gives more weight to the tails of the distribution,
or one of the variants of this last distance proposed by Luceno (2006)
(see mgedist
for more details). This method is not suitable for discrete distributions.
method = "mse"
Maximum goodnessoffit estimation consists in maximizing the average log spacing.
A numerical optimization is carried out in msedist
via optim
.
By default, direct optimization of the loglikelihood (or other criteria depending
of the chosen method) is performed using optim
,
with the "NelderMead" method for distributions characterized by more than one parameter
and the "BFGS" method for distributions characterized by only one parameter.
The optimization algorithm used in optim
can be chosen or another optimization function
can be specified using ... argument (see mledist
for details).
start
may be omitted (i.e. NULL
) for some classic distributions
(see the 'details' section of mledist
).
Note that when errors are raised by optim
, it's a good idea to start by adding traces during
the optimization process by adding control=list(trace=1, REPORT=1)
in ... argument.
Once the parameter(s) is(are) estimated, fitdist
computes the loglikelihood
for every estimation method and for maximum likelihood estimation the standard errors of
the estimates calculated from the Hessian at the solution found by optim
or by the usersupplied function passed to mledist.
By default (keepdata = TRUE
), the object returned by fitdist
contains
the data vector given in input.
When dealing with large datasets, we can remove the original dataset from the output by
setting keepdata = FALSE
. In such a case, only keepdata.nb
points (at most)
are kept by random subsampling keepdata.nb
2 points from the dataset and
adding the minimum and the maximum. If combined with bootdist
, and use with
nonparametric bootstrap be aware that bootstrap is performed on the subset
randomly selected in fitdist
. Currently, the graphical comparisons of multiple fits
is not available in this framework.
Weighted version of the estimation process is available for method = "mle", "mme", "qme"
by using weights=...
. See the corresponding man page for details.
Weighted maximum GOF estimation (when method = "mge"
) is not allowed.
It is not yet possible to take into account weighths in functions plotdist
,
plot.fitdist
, cdfcomp
, denscomp
, ppcomp
,
qqcomp
, gofstat
and descdist
(developments planned in the future).
NB: if your data values are particularly small or large, a scaling may be needed
before the optimization process. See example (14) in this man page and
examples (14,15) in the test file of the package.
Please also take a look at the Rmpfr
package available on CRAN for numerical
accuracy issues.
fitdist
returns an object of class "fitdist"
, a list with the following components:
estimate 
the parameter estimates. 
method 
the character string coding for the fitting method :

sd 
the estimated standard errors, 
cor 
the estimated correlation matrix, 
vcov 
the estimated variancecovariance matrix, 
loglik 
the loglikelihood. 
aic 
the Akaike information criterion. 
bic 
the the socalled BIC or SBC (Schwarz Bayesian criterion). 
n 
the length of the data set. 
data 
the data set. 
distname 
the name of the distribution. 
fix.arg 
the named list giving the values of parameters of the named distribution
that must be kept fixed rather than estimated by maximum likelihood or 
fix.arg.fun 
the function used to set the value of 
dots 
the list of further arguments passed in ... to be used in 
convergence 
an integer code for the convergence of

discrete 
the input argument or the automatic definition by the function to be passed
to functions 
weights 
the vector of weigths used in the estimation process or 
Generic functions:
print
The print of a "fitdist"
object shows few traces about the fitting method and
the fitted distribution.
summary
The summary provides the parameter estimates of the fitted distribution, the loglikelihood, AIC and BIC statistics and when the maximum likelihood is used, the standard errors of the parameter estimates and the correlation matrix between parameter estimates.
plot
The plot of an object of class "fitdist" returned by fitdist
uses the function
plotdist
. An object of class "fitdist" or a list of objects of class
"fitdist" corresponding to various fits using the same data set may also be plotted
using a cdf plot (function cdfcomp
),
a density plot(function denscomp
),
a density QQ plot (function qqcomp
),
or a PP plot (function ppcomp
).
logLik
Extracts the estimated loglikelihood from the "fitdist"
object.
vcov
Extracts the estimated varcovariance matrix from the "fitdist"
object
(only available When method = "mle"
).
coef
Extracts the fitted coefficients from the "fitdist"
object.
MarieLaure DelignetteMuller and Christophe Dutang.
Cullen AC and Frey HC (1999), Probabilistic techniques in exposure assessment. Plenum Press, USA, pp. 81155.
Venables WN and Ripley BD (2002), Modern applied statistics with S. Springer, New York, pp. 435446.
Vose D (2000), Risk analysis, a quantitative guide. John Wiley & Sons Ltd, Chischester, England, pp. 99143.
DelignetteMuller ML and Dutang C (2015), fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, 64(4), 134.
See fitdistrplus
for an overview of the package.
See mledist
, mmedist
, qmedist
,
mgedist
, msedist
for details on parameter estimation.
See gofstat
for goodnessoffit statistics.
See plotdist
, graphcomp
, CIcdfplot
for graphs
(with or without uncertainty and/or multiple fits).
See llplot
for (log)likelihood plots in the
neighborhood of the fitted value.
See bootdist
for bootstrap procedures
and fitdistcens
for censoreddata fitting methods.
See optim
for base R optimization procedures.
See quantile.fitdist
, another generic function, which calculates
quantiles from the fitted distribution.
See quantile
for base R quantile computation.
# (1) fit of a gamma distribution by maximum likelihood estimation
#
data(groundbeef)
serving < groundbeef$serving
fitg < fitdist(serving, "gamma")
summary(fitg)
plot(fitg)
plot(fitg, demp = TRUE)
plot(fitg, histo = FALSE, demp = TRUE)
cdfcomp(fitg, addlegend=FALSE)
denscomp(fitg, addlegend=FALSE)
ppcomp(fitg, addlegend=FALSE)
qqcomp(fitg, addlegend=FALSE)
# (2) use the moment matching estimation (using a closed formula)
#
fitgmme < fitdist(serving, "gamma", method="mme")
summary(fitgmme)
# (3) Comparison of various fits
#
fitW < fitdist(serving, "weibull")
fitg < fitdist(serving, "gamma")
fitln < fitdist(serving, "lnorm")
summary(fitW)
summary(fitg)
summary(fitln)
cdfcomp(list(fitW, fitg, fitln), legendtext=c("Weibull", "gamma", "lognormal"))
denscomp(list(fitW, fitg, fitln), legendtext=c("Weibull", "gamma", "lognormal"))
qqcomp(list(fitW, fitg, fitln), legendtext=c("Weibull", "gamma", "lognormal"))
ppcomp(list(fitW, fitg, fitln), legendtext=c("Weibull", "gamma", "lognormal"))
gofstat(list(fitW, fitg, fitln), fitnames=c("Weibull", "gamma", "lognormal"))
# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view
# dedicated to probability distributions
#
dgumbel < function(x, a, b) 1/b*exp((ax)/b)*exp(exp((ax)/b))
pgumbel < function(q, a, b) exp(exp((aq)/b))
qgumbel < function(p, a, b) ab*log(log(p))
fitgumbel < fitdist(serving, "gumbel", start=list(a=10, b=10))
summary(fitgumbel)
plot(fitgumbel)
# (5) fit discrete distributions (Poisson and negative binomial)
#
data(toxocara)
number < toxocara$number
fitp < fitdist(number,"pois")
summary(fitp)
plot(fitp)
fitnb < fitdist(number,"nbinom")
summary(fitnb)
plot(fitnb)
cdfcomp(list(fitp,fitnb))
gofstat(list(fitp,fitnb))
# (6) how to change the optimisation method?
#
data(groundbeef)
serving < groundbeef$serving
fitdist(serving, "gamma", optim.method="NelderMead")
fitdist(serving, "gamma", optim.method="BFGS")
fitdist(serving, "gamma", optim.method="SANN")
# (7) custom optimization function
#
#create the sample
set.seed(1234)
mysample < rexp(100, 5)
mystart < list(rate=8)
res1 < fitdist(mysample, dexp, start= mystart, optim.method="NelderMead")
#show the result
summary(res1)
#the warning tell us to use optimise, because the NelderMead is not adequate.
#to meet the standard 'fn' argument and specific name arguments, we wrap optimize,
myoptimize < function(fn, par, ...)
{
res < optimize(f=fn, ..., maximum=FALSE)
#assume the optimization function minimize
standardres < c(res, convergence=0, value=res$objective,
par=res$minimum, hessian=NA)
return(standardres)
}
#call fitdist with a 'custom' optimization function
res2 < fitdist(mysample, "exp", start=mystart, custom.optim=myoptimize,
interval=c(0, 100))
#show the result
summary(res2)
# (8) custom optimization function  another example with the genetic algorithm
#
#set a sample
fit1 < fitdist(serving, "gamma")
summary(fit1)
#wrap genoud function rgenoud package
mygenoud < function(fn, par, ...)
{
require(rgenoud)
res < genoud(fn, starting.values=par, ...)
standardres < c(res, convergence=0)
return(standardres)
}
#call fitdist with a 'custom' optimization function
fit2 < fitdist(serving, "gamma", custom.optim=mygenoud, nvars=2,
Domains=cbind(c(0, 0), c(10, 10)), boundary.enforcement=1,
print.level=1, hessian=TRUE)
summary(fit2)
# (9) estimation of the standard deviation of a gamma distribution
# by maximum likelihood with the shape fixed at 4 using the argument fix.arg
#
data(groundbeef)
serving < groundbeef$serving
f1c < fitdist(serving,"gamma",start=list(rate=0.1),fix.arg=list(shape=4))
summary(f1c)
plot(f1c)
# (10) fit of a Weibull distribution to serving size data
# by maximum likelihood estimation
# or by quantile matching estimation (in this example
# matching first and third quartiles)
#
data(groundbeef)
serving < groundbeef$serving
fWmle < fitdist(serving, "weibull")
summary(fWmle)
plot(fWmle)
gofstat(fWmle)
fWqme < fitdist(serving, "weibull", method="qme", probs=c(0.25, 0.75))
summary(fWqme)
plot(fWqme)
gofstat(fWqme)
# (11) Fit of a Pareto distribution by numerical moment matching estimation
#
require(actuar)
#simulate a sample
x4 < rpareto(1000, 6, 2)
#empirical raw moment
memp < function(x, order) mean(x^order)
#fit
fP < fitdist(x4, "pareto", method="mme", order=c(1, 2), memp="memp",
start=list(shape=10, scale=10), lower=1, upper=Inf)
summary(fP)
plot(fP)
# (12) Fit of a Weibull distribution to serving size data by maximum
# goodnessoffit estimation using all the distances available
#
data(groundbeef)
serving < groundbeef$serving
(f1 < fitdist(serving, "weibull", method="mge", gof="CvM"))
(f2 < fitdist(serving, "weibull", method="mge", gof="KS"))
(f3 < fitdist(serving, "weibull", method="mge", gof="AD"))
(f4 < fitdist(serving, "weibull", method="mge", gof="ADR"))
(f5 < fitdist(serving, "weibull", method="mge", gof="ADL"))
(f6 < fitdist(serving, "weibull", method="mge", gof="AD2R"))
(f7 < fitdist(serving, "weibull", method="mge", gof="AD2L"))
(f8 < fitdist(serving, "weibull", method="mge", gof="AD2"))
cdfcomp(list(f1, f2, f3, f4, f5, f6, f7, f8))
cdfcomp(list(f1, f2, f3, f4, f5, f6, f7, f8),
xlogscale=TRUE, xlim=c(8, 250), verticals=TRUE)
denscomp(list(f1, f2, f3, f4, f5, f6, f7, f8))
# (13) Fit of a uniform distribution using maximum likelihood
# (a closed formula is used in this special case where the loglikelihood is not defined),
# or maximum goodnessoffit with Cramervon Mises or KolmogorovSmirnov distance
#
set.seed(1234)
u < runif(50, min=5, max=10)
fumle < fitdist(u, "unif", method="mle")
summary(fumle)
plot(fumle)
gofstat(fumle)
fuCvM < fitdist(u, "unif", method="mge", gof="CvM")
summary(fuCvM)
plot(fuCvM)
gofstat(fuCvM)
fuKS < fitdist(u, "unif", method="mge", gof="KS")
summary(fuKS)
plot(fuKS)
gofstat(fuKS)
# (14) scaling problem
# the simulated dataset (below) has particularly small values, hence without scaling (10^0),
# the optimization raises an error. The for loop shows how scaling by 10^i
# for i=1,...,6 makes the fitting procedure work correctly.
set.seed(1234)
x2 < rnorm(100, 1e4, 2e4)
for(i in 0:6)
cat(i, try(fitdist(x2*10^i, "cauchy", method="mle")$estimate, silent=TRUE), "\n")
# (15) Fit of a normal distribution on acute toxicity values of endosulfan in log10 for
# nonarthropod invertebrates, using maximum likelihood estimation
# to estimate what is called a species sensitivity distribution
# (SSD) in ecotoxicology, followed by estimation of the 5 percent quantile value of
# the fitted distribution (which is called the 5 percent hazardous concentration, HC5,
# in ecotoxicology) and estimation of other quantiles.
#
data(endosulfan)
ATV < subset(endosulfan, group == "NonArthroInvert")$ATV
log10ATV < log10(subset(endosulfan, group == "NonArthroInvert")$ATV)
fln < fitdist(log10ATV, "norm")
quantile(fln, probs = 0.05)
quantile(fln, probs = c(0.05, 0.1, 0.2))
# (16) Fit of a triangular distribution using Cramervon Mises or
# KolmogorovSmirnov distance
#
set.seed(1234)
require(mc2d)
t < rtriang(100, min=5, mode=6, max=10)
fCvM < fitdist(t, "triang", method="mge", start = list(min=4, mode=6,max=9), gof="CvM")
fKS < fitdist(t, "triang", method="mge", start = list(min=4, mode=6,max=9), gof="KS")
cdfcomp(list(fCvM,fKS))
# (17) fit a non classical discrete distribution (the zero inflated Poisson distribution)
#
require(gamlss.dist)
set.seed(1234)
x < rZIP(n = 30, mu = 5, sigma = 0.2)
plotdist(x, discrete = TRUE)
fitzip < fitdist(x, "ZIP", start = list(mu = 4, sigma = 0.15), discrete = TRUE,
optim.method = "LBFGSB", lower = c(0, 0), upper = c(Inf, 1))
summary(fitzip)
plot(fitzip)
fitp < fitdist(x, "pois")
cdfcomp(list(fitzip, fitp))
gofstat(list(fitzip, fitp))
# (18) examples with distributions in actuar (predefined starting values)
#
require(actuar)
x < c(2.3,0.1,2.7,2.2,0.4,2.6,0.2,1.,7.3,3.2,0.8,1.2,33.7,14.,
21.4,7.7,1.,1.9,0.7,12.6,3.2,7.3,4.9,4000.,2.5,6.7,3.,63.,
6.,1.6,10.1,1.2,1.5,1.2,30.,3.2,3.5,1.2,0.2,1.9,0.7,17.,
2.8,4.8,1.3,3.7,0.2,1.8,2.6,5.9,2.6,6.3,1.4,0.8)
#log logistic
ft_llogis < fitdist(x,'llogis')
x < c(0.3837053, 0.8576858, 0.3552237, 0.6226119, 0.4783756, 0.3139799, 0.4051403,
0.4537631, 0.4711057, 0.5647414, 0.6479617, 0.7134207, 0.5259464, 0.5949068,
0.3509200, 0.3783077, 0.5226465, 1.0241043, 0.4384580, 1.3341520)
#inverse weibull
ft_iw < fitdist(x,'invweibull')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.