nrgkw | R Documentation |
An industrial-strength implementation of maximum likelihood estimation (MLE) for the parameters of any distribution in the Generalized Kumaraswamy (GKw) family. This function incorporates multiple advanced numerical techniques including trust region methods, eigenvalue-based regularization, adaptive scaling, and sophisticated line search to ensure robust convergence even for challenging datasets or extreme parameter values.
nrgkw(
start = NULL,
data = as.numeric(c()),
family = "gkw",
tol = 1e-06,
max_iter = 100L,
verbose = FALSE,
optimization_method = "trust-region",
enforce_bounds = TRUE,
min_param_val = 1e-05,
max_param_val = 1e+05,
adaptive_scaling = TRUE,
use_stochastic_perturbation = TRUE,
get_num_hess = FALSE,
multi_start_attempts = 3L,
eigenvalue_hessian_reg = TRUE,
max_backtrack = 20L,
initial_trust_radius = 1
)
start |
A numeric vector containing initial values for the parameters.
If NULL, automatic initialization is used based on the dataset characteristics.
The length and order must correspond to the selected |
data |
A numeric vector containing the observed data. All values must be strictly between 0 and 1. |
family |
A character string specifying the distribution family. One of
|
tol |
Convergence tolerance. The algorithm stops when the Euclidean norm
of the gradient is below this value, or if relative changes in parameters
or the negative log-likelihood are below this threshold across consecutive
iterations. Default: |
max_iter |
Maximum number of iterations allowed. Default: |
verbose |
Logical; if |
optimization_method |
Character string specifying the optimization method: "trust-region" (default), "newton-raphson", or "hybrid" (starts with trust-region, switches to newton-raphson near convergence). |
enforce_bounds |
Logical; if |
min_param_val |
Minimum allowed value for parameters constrained to be
strictly positive ( |
max_param_val |
Maximum allowed value for all parameters. Default: |
adaptive_scaling |
Logical; if |
use_stochastic_perturbation |
Logical; if |
get_num_hess |
Logical; if |
multi_start_attempts |
Integer specifying the number of different starting points
to try if initial optimization fails to converge. Default: |
eigenvalue_hessian_reg |
Logical; if |
max_backtrack |
Integer specifying the maximum number of backtracking steps
in line search. Default: |
initial_trust_radius |
Initial radius for trust region method. Default: |
This enhanced algorithm provides robust parameter estimation for the Generalized Kumaraswamy distribution and its subfamilies. The function implements several advanced numerical optimization techniques to maximize the likelihood function reliably even in difficult cases.
The Generalized Kumaraswamy (GKw) distribution, introduced by Carrasco, Ferrari, and Cordeiro (2010), is a flexible five-parameter continuous distribution defined on the standard unit interval (0,1). Its probability density function is given by:
f(x; \alpha, \beta, \gamma, \delta, \lambda) = \frac{\lambda\alpha\beta x^{\alpha-1}}{B(\gamma, \delta+1)}(1-x^{\alpha})^{\beta-1}[1-(1-x^{\alpha})^{\beta}]^{\gamma\lambda-1}\{1-[1-(1-x^{\alpha})^{\beta}]^{\lambda}\}^{\delta}
where \alpha, \beta, \gamma, \lambda > 0
and \delta \geq 0
are the model
parameters, and B(\gamma, \delta+1)
is the beta function.
The GKw distribution encompasses several important special cases:
GKw (5 parameters): \alpha, \beta, \gamma, \delta, \lambda
BKw (4 parameters): \alpha, \beta, \gamma, \delta
(with \lambda = 1
)
KKw (4 parameters): \alpha, \beta, \delta, \lambda
(with \gamma = 1
)
EKw (3 parameters): \alpha, \beta, \lambda
(with \gamma = 1, \delta = 0
)
Mc (3 parameters): \gamma, \delta, \lambda
(with \alpha = 1, \beta = 1
)
Kw (2 parameters): \alpha, \beta
(with \gamma = 1, \delta = 0, \lambda = 1
)
Beta(2 parameters): \gamma, \delta
(with \alpha = 1, \beta = 1, \lambda = 1
)
The trust region approach restricts parameter updates to a region where the quadratic approximation of the objective function is trusted to be accurate. This algorithm implements the Levenberg-Marquardt variant, which solves the subproblem:
\min_p m_k(p) = -\nabla \ell(\theta_k)^T p + \frac{1}{2}p^T H_k p
\text{subject to } \|p\| \leq \Delta_k
where \nabla \ell(\theta_k)
is the gradient, H_k
is the Hessian, and \Delta_k
is the trust region radius at iteration k
.
The Levenberg-Marquardt approach adds a regularization parameter \lambda
to the
Hessian, solving:
(H_k + \lambda I)p = -\nabla \ell(\theta_k)
The parameter \lambda
controls the step size and direction:
When \lambda
is large, the step approaches a scaled steepest descent direction.
When \lambda
is small, the step approaches the Newton direction.
The algorithm dynamically adjusts \lambda
based on the agreement between the quadratic model and the actual function:
\rho_k = \frac{f(\theta_k) - f(\theta_k + p_k)}{m_k(0) - m_k(p_k)}
The trust region radius is updated according to:
If \rho_k < 0.25
, reduce the radius: \Delta_{k+1} = 0.5\Delta_k
If \rho_k > 0.75
and \|p_k\| \approx \Delta_k
, increase the radius: \Delta_{k+1} = 2\Delta_k
Otherwise, leave the radius unchanged: \Delta_{k+1} = \Delta_k
The step is accepted if \rho_k > \eta
(typically \eta = 0.1
).
For the Newton-Raphson method to converge, the Hessian matrix must be positive definite. This algorithm uses eigendecomposition to create a positive definite approximation that preserves the Hessian's structure:
H = Q\Lambda Q^T
where Q
contains the eigenvectors and \Lambda
is a diagonal matrix of eigenvalues.
The regularized Hessian is constructed by:
\tilde{H} = Q\tilde{\Lambda}Q^T
where \tilde{\Lambda}
contains modified eigenvalues:
\tilde{\lambda}_i = \max(\lambda_i, \epsilon)
with \epsilon
being a small positive threshold (typically 10^{-6}
).
This approach is superior to diagonal loading (H + \lambda I
) as it:
Preserves the eigenvector structure of the original Hessian
Minimally modifies the eigenvalues needed to ensure positive definiteness
Better maintains the directional information in the Hessian
When parameters have widely different magnitudes, optimization can become numerically unstable. The adaptive scaling system transforms the parameter space to improve conditioning:
\theta_i^{scaled} = s_i \theta_i
where scaling factors s_i
are determined by:
For large parameters (|\theta_i| > 100
): s_i = 100/|\theta_i|
For small parameters (0 < |\theta_i| < 0.01
): s_i = 0.01/|\theta_i|
Otherwise: s_i = 1
The optimization is performed in the scaled space, with appropriate transformations for the gradient and Hessian:
\nabla \ell(\theta^{scaled})_i = \frac{1}{s_i}\nabla \ell(\theta)_i
H(\theta^{scaled})_{ij} = \frac{1}{s_i s_j}H(\theta)_{ij}
The final results are back-transformed to the original parameter space before being returned.
The line search procedure ensures sufficient decrease in the objective function when taking a step in the search direction. The implementation uses Wolfe conditions which include both:
Sufficient decrease (Armijo) condition:
f(\theta_k + \alpha p_k) \leq f(\theta_k) + c_1 \alpha \nabla f(\theta_k)^T p_k
Curvature condition:
|\nabla f(\theta_k + \alpha p_k)^T p_k| \leq c_2 |\nabla f(\theta_k)^T p_k|
where 0 < c_1 < c_2 < 1
, typically c_1 = 10^{-4}
and c_2 = 0.9
.
The step length \alpha
is determined using polynomial interpolation:
First iteration: quadratic interpolation
Subsequent iterations: cubic interpolation using function and derivative values
The cubic polynomial model has the form:
a\alpha^3 + b\alpha^2 + c\alpha + d
The algorithm computes coefficients from values at two points, then finds the minimizer of this polynomial subject to bounds to ensure convergence.
When analytical derivatives are unreliable, the algorithm uses numerical differentiation with adaptive step sizes based on parameter magnitudes:
h_i = \max(h_{min}, \min(h_{base}, h_{base} \cdot |\theta_i|))
where h_{min}
is a minimum step size (typically 10^{-8}
), h_{base}
is a base step size (typically 10^{-5}
), and \theta_i
is the parameter value.
For computing diagonal Hessian elements, the central difference formula is used:
\frac{\partial^2 f}{\partial \theta_i^2} \approx \frac{f(\theta + h_i e_i) - 2f(\theta) + f(\theta - h_i e_i)}{h_i^2}
For mixed partial derivatives:
\frac{\partial^2 f}{\partial \theta_i \partial \theta_j} \approx \frac{f(\theta + h_i e_i + h_j e_j) - f(\theta + h_i e_i - h_j e_j) - f(\theta - h_i e_i + h_j e_j) + f(\theta - h_i e_i - h_j e_j)}{4h_i h_j}
The algorithm validates numerical differentiation by comparing with existing gradients and adaptively adjusts step sizes when discrepancies are detected.
To escape flat regions or local minima, the algorithm implements controlled stochastic perturbation when progress stalls (detected by monitoring gradient norm changes):
\theta_i^{new} = \theta_i + \Delta\theta_i
where the perturbation \Delta\theta_i
combines:
A directed component opposite to the gradient: -\text{sign}(\nabla \ell_i) \cdot 0.05 \cdot |\theta_i|
A random noise component: U(-0.05|\theta_i|, 0.05|\theta_i|)
The perturbation is applied when:
The relative change in gradient norm is below a threshold for several consecutive iterations
The algorithm appears to be stuck in a non-optimal region
The perturbation is accepted only if it improves the objective function value.
For particularly challenging optimization landscapes, the algorithm can employ multiple starting points:
Initial values are generated using moment-based estimation and domain knowledge about each distribution family
Each initial point is randomly perturbed to explore different regions of the parameter space
Independent optimization runs are performed from each starting point
The best result (based on likelihood value and convergence status) is returned
This approach increases the probability of finding the global optimum or a high-quality local optimum, particularly for complex models with many parameters.
Intelligent starting values are critical for convergence in complex models. The algorithm uses data-driven initialization based on:
Method of moments estimators for beta parameters:
\alpha + \beta = \frac{\bar{x}(1-\bar{x})}{s^2} - 1
\alpha = (\alpha + \beta)\bar{x}
Transformations to Kumaraswamy parameters:
a_{Kw} = \sqrt{\alpha_{Beta}}
b_{Kw} = \sqrt{\beta_{Beta}}
Adjustments based on data skewness (detected via mean relative to 0.5)
Corrections based on data dispersion (range relative to (0,1) interval)
The transformations between beta and Kumaraswamy parameters leverage the similarities between these distributions while accounting for their parametric differences.
The algorithm can dynamically switch between trust region and Newton-Raphson methods based on optimization progress:
Early iterations: trust region method for stability and global convergence properties
Later iterations (when close to optimum): Newton-Raphson with line search for quadratic convergence rate
The switching criteria are based on iteration count and gradient norm, with additional logic to handle cases where one method consistently fails.
A list object of class gkw_fit
containing the following components:
parameters |
A named numeric vector with the estimated parameters. |
loglik |
The maximized value of the log-likelihood function. |
iterations |
Number of iterations performed. |
converged |
Logical flag indicating whether the algorithm converged successfully. |
param_history |
A matrix where rows represent iterations and columns represent parameter values. |
loglik_history |
A vector of log-likelihood values at each iteration. |
gradient |
The gradient vector at the final parameter estimates. |
hessian |
The analytical Hessian matrix at the final parameter estimates. |
std_errors |
A named numeric vector of approximate standard errors for the parameters. |
aic |
Akaike Information Criterion. |
bic |
Bayesian Information Criterion. |
aicc |
AIC with small sample correction. |
n |
The sample size. |
status |
A character string indicating the termination status. |
z_values |
A named numeric vector of Z-statistics for parameter significance testing. |
p_values |
A named numeric vector of two-sided p-values corresponding to the Z-statistics. |
param_names |
A character vector of the parameter names. |
family |
The distribution family used. |
optimization_method |
The optimization method used. |
numeric_hessian |
The numerically approximated Hessian at the solution (if requested). |
condition_number |
The condition number of the final Hessian, a measure of parameter identifiability. |
scaling_factors |
The scaling factors used for parameters (if adaptive scaling was enabled). |
Although this implementation is highly robust, fitting complex distributions can still be challenging. For best results:
Try multiple starting values if results seem suboptimal
Examine diagnostic information carefully, especially condition numbers and standard errors
Be cautious of parameter estimates at or very near boundaries
Consider model simplification if convergence is consistently problematic
For the full GKw model with 5 parameters, convergence may be sensitive to starting values
High condition numbers (>1e6) may indicate parameter redundancy or weak identifiability
Enhanced by Lopes, J. E.
Carrasco, J. M. F., Ferrari, S. L. P., & Cordeiro, G. M. (2010). A new generalized Kumaraswamy distribution. arXiv preprint arXiv:1004.0911.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.
Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust Region Methods. MPS-SIAM Series on Optimization.
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.
# Generate sample data from a Beta(2,5) distribution for testing
set.seed(123)
sample_data <- rbeta_(200, 2, 5)
# Automatic initialization (recommended for beginners)
result_auto <- nrgkw(NULL, sample_data, family = "beta", verbose = FALSE)
print(result_auto$parameters)
print(result_auto$loglik)
# Compare different optimization methods
methods <- c("trust-region", "newton-raphson", "hybrid")
results <- list()
for (method in methods) {
results[[method]] <- nrgkw(NULL, sample_data, family = "beta",
optimization_method = method)
cat(sprintf("Method: %s, AIC: %.4f\n", method, results[[method]]$aic))
}
# Fit the full GKw model with diagnostic information
gkw_result <- nrgkw(NULL, sample_data, family = "gkw",
verbose = FALSE, get_num_hess = TRUE)
# Examine parameter identifiability through condition number
cat(sprintf("Condition number: %.2e\n", gkw_result$condition_number))
print(gkw_result)
# Compare with simpler models using information criteria
cat("Information criteria comparison:\n")
cat(sprintf("GKw: AIC=%.4f, BIC=%.4f\n", gkw_result$aic, gkw_result$bic))
cat(sprintf("Beta: AIC=%.4f, BIC=%.4f\n",
results[["trust-region"]]$aic, results[["trust-region"]]$bic))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.