fitCDF | R Documentation |
Usually the parameter estimation of a cumulative distribution function (*CDF*) are accomplished using the corresponding probability density function (*PDF*). Different optimization algorithms can be used to accomplished this task and different algorithms can yield different estimated parameters. Hence, why not try to fit the CDF directly?
fitCDF(varobj, ...)
## S4 method for signature 'numeric'
fitCDF(
varobj,
distNames,
plot = FALSE,
plot.num = 1L,
distf = NULL,
start = NULL,
loss.fun = c("linear", "huber", "smooth", "cauchy", "arctg"),
min.val = NULL,
only.info = FALSE,
maxiter = 1024,
maxfev = 1e+05,
ptol = 1e-12,
nls.model = FALSE,
algorithm = c("default", "plinear", "port"),
xlabel = "x",
mar = c(4, 4, 3, 1),
mgp = c(2.5, 0.6, 0),
las = 1,
cex.main = 1,
cex.text = 0.8,
cex.point = 0.5,
verbose = TRUE,
...
)
## S4 method for signature 'list_OR_matrix_OR_dataframe'
fitCDF(
varobj,
distNames,
plot = FALSE,
plot.num = 1L,
distf = NULL,
start = NULL,
loss.fun = c("linear", "huber", "smooth", "cauchy", "arctg"),
min.val = NULL,
only.info = FALSE,
maxiter = 1024,
maxfev = 1e+05,
ptol = 1e-12,
nls.model = FALSE,
algorithm = c("default", "plinear", "port"),
xlabel = "x",
mar = c(4, 4, 3, 1),
mgp = c(2.5, 0.6, 0),
las = 1,
cex.main = 1,
cex.text = 0.8,
cex.point = 0.5,
num.cores = 1L,
tasks = 0L,
verbose = TRUE,
...
)
varobj |
A a vector, a named list, a matrix or a data.frame, containing the observations from the variable for which the CDF parameters will be estimated. When the argument is a matrix or a data.frame, the columns must be named, carrying the objective variables. |
... |
(Optional) Further graphical parameters (see
|
distNames |
a vector of distribution numbers to select from the listed below in details section, e.g. c(1:10, 15). If 'distNames' is not any of current 20 named distributions, then it can be any arbitrary character string, but the argument 'distf' must be given (see below). |
plot |
Logical. Default FALSE Whether to produce the plots for the best fitted CDF. |
plot.num |
The number of distributions to be plotted. |
distf |
A character string naming a cumulative distribution function(s) (CDF) present in the R session environment . For example, gamma or norm, etc, from where, internally, we can get: density, distribution function, quantile function and random generation as: dnorm, pnorm, qnorm, and rnorm, respectively. If the function is not present in the environment, then an error will be returned. It must given only if 'distNames' is not any of current 20 named distributions (see details below). Default is NULL. |
start |
A named numerical vector giving the parameters to be optimized with initial values or a list of numerical vectors (only when varobj is a list, a matrix or a data.frame). This can be omitted for some of the named distributions (see Details). This argument will be used if provided for only one distribution. The default parameter values are:
|
loss.fun |
Loss function(s) used in the regression (see
(Loss function)). After
|
min.val |
A number denoting the lower bound of the domain where CDF is defined. For example, for Weibull and GGamma min.val = 0. |
only.info |
Logic. Default TRUE. If true, only information about the parameter estimation is returned. |
maxiter, maxfev, ptol |
Parameters to control of various aspects of the
Levenberg-Marquardt algorithm through function
|
nls.model |
Logical. Whether to return the best fitted model as an
object from nlsModel class. Default is FALSE. If TRUE, then
the estimated parameters are used new fitting with |
algorithm |
Only if nls.model = TRUE. The same as for
|
xlabel |
(Optional) Label for variable varobj. Default is xlabel = "x". |
mar, mgp, las, cex.main |
(Optional) Graphical parameters (see
|
cex.text, cex.point |
Numerical value to scale text and points. |
verbose |
Logic. If TRUE, prints the function log to stdout |
num.cores, tasks |
Parameters for parallel computation using package
|
The nonlinear fit (NLF) problem for CDFs is addressed with
Levenberg-Marquardt algorithm implemented in function
nls.lm
from package *minpack.lm*. The Stein's rho
for adjusted R squared (rho) is applied as an estimator of the average
cross-validation predictive power [1]. This function is inspired in a script
for the function fitDistr
from the package propagate
[2]. Some parts or script ideas from function
fitDistr
are used, but here we to estimate CDF and
not the PDF as in the case of "fitDistr
. More
informative results are given now. The studentized residuals are provided as
well. The list (so far) of possible CDFs is:
Normal (Wikipedia)
Log-normal (Wikipedia). This
This function is set to fit log(1+x)
. Users can transform their
variable by themself and then try the fitting to Normal
distribution.
Half-normal (Wikipedia). An
Alternatively using a scaled precision (inverse of the variance)
parametrization (to avoid issues if \sigma
is near zero),
obtained by setting \theta=sqrt(\pi)/\sigma*sqrt(2)
.
Generalized Normal (Wikipedia)
T-Generalized Normal [3].
Laplace (Wikipedia)
Gamma (Wikipedia)
3P Gamma [4].
Generalized 4P Gamma [4] (Wikipedia)
Generalized 3P Gamma [4].
Weibull (Wikipedia)
3P Weibull (Wikipedia)
Beta (Wikipedia)
3P Beta (Wikipedia)
4P Beta (Wikipedia)
Beta-Weibull ReliaWiki
Generalized Beta (Wikipedia)
Rayleigh (Wikipedia)
Exponential (Wikipedia)
2P Exponential (Wikipedia)
Geometric (Wikipedia)
Log-Gamma (Mathematica)
Log-Gamma 3P (Mathematica)
Where, shape_scale function is an internal function that can be retrieve by typing: usefr:::shape_scale.
In case of failing the parallel computation, please, try with function:
fitCDF2
.
After return the plots, a list with following values is provided:
aic: Akaike information creterion
fit: list of results of fitted distribution, with parameter values
bestfit: the best fitted distribution according to AIC
fitted: fitted values from the best fit
rstudent: studentized residuals
residuals: residuals
After cdf = fitCDF( varobj, ...), attributes( cdf$bestfit ) shows the list of objects carry on cdf$bestfit:
names: "par" "hessian" "fvec" "info" "message" "diag" "niter" "rsstrace" "deviance"
class: "nls.lm"
And fitting details can be retrieved with summary(cdf$bestfit)
Robersy Sanchez (https://genomaths.com).
Stevens JP. Applied Multivariate Statistics for the Social Sciences. Fifth Edit. Routledge Academic; 2009.
Andrej-Nikolai Spiess (2014). propagate: Propagation of Uncertainty. R package version 1.0-4. http://CRAN.R-project.org/package=propagate
Abramowitz, M. and Stegun, I. A. (1972) Handbook of Mathematical Functions. New York: Dover. Chapter 6: Gamma and Related Functions.
Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists (pag 73) by Christian Walck. Particle Physics Group Fysikum. University of Stockholm (e-mail: walck@physto.se).
fitCDF2
, fitdistr
and
fitMixDist
, for goodness-of-fit: mcgoftest
.
set.seed(1230)
x1 <- rnorm(10000, mean = 0.5, sd = 1)
cdfp <- fitCDF(x1, distNames = "Normal", plot = FALSE)
summary(cdfp$bestfit)
## Add some cosmetics to the plots
cdfp <- fitCDF(x1,
distNames = "Normal", xlabel = "My Nice Variable Label",
plot = T, font.lab = 3, font = 2, font.axis = 2, family = "serif",
cex.lab = 1.3, cex.axis = 1.3
)
## Fitting a Weibull distribution with 3 paramaters
x1 <- rweibull3p(1000, shape = 0.5, scale = 1, mu = 0.1)
cdfp <- fitCDF(x1,
distNames = "3P Weibull",
xlabel = "My Nice Variable Label",
plot = T, font.lab = 3, font = 2, font.axis = 2, family = "serif",
cex.lab = 1.3, cex.axis = 1.3, cex.main = 1.1,
mgp = c(2.5, 1, 0)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.