nrgkw: Enhanced Newton-Raphson Optimization for GKw Family...
In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data

nrgkw

R Documentation

Enhanced Newton-Raphson Optimization for GKw Family Distributions

Description

An industrial-strength implementation of maximum likelihood estimation (MLE) for the parameters of any distribution in the Generalized Kumaraswamy (GKw) family. This function incorporates multiple advanced numerical techniques including trust region methods, eigenvalue-based regularization, adaptive scaling, and sophisticated line search to ensure robust convergence even for challenging datasets or extreme parameter values.

Usage

nrgkw(
  start = NULL,
  data = as.numeric(c()),
  family = "gkw",
  tol = 1e-06,
  max_iter = 100L,
  verbose = FALSE,
  optimization_method = "trust-region",
  enforce_bounds = TRUE,
  min_param_val = 1e-05,
  max_param_val = 1e+05,
  adaptive_scaling = TRUE,
  use_stochastic_perturbation = TRUE,
  get_num_hess = FALSE,
  multi_start_attempts = 3L,
  eigenvalue_hessian_reg = TRUE,
  max_backtrack = 20L,
  initial_trust_radius = 1
)

Arguments

`start`	A numeric vector containing initial values for the parameters. If NULL, automatic initialization is used based on the dataset characteristics. The length and order must correspond to the selected `family` (e.g., `c(alpha, beta, gamma, delta, lambda)` for "gkw"; `c(alpha, beta)` for "kw"; `c(gamma, delta)` for "beta").
`data`	A numeric vector containing the observed data. All values must be strictly between 0 and 1.
`family`	A character string specifying the distribution family. One of `"gkw"`, `"bkw"`, `"kkw"`, `"ekw"`, `"mc"`, `"kw"`, or `"beta"`. Default: `"gkw"`.
`tol`	Convergence tolerance. The algorithm stops when the Euclidean norm of the gradient is below this value, or if relative changes in parameters or the negative log-likelihood are below this threshold across consecutive iterations. Default: `1e-6`.
`max_iter`	Maximum number of iterations allowed. Default: `100`.
`verbose`	Logical; if `TRUE`, prints detailed progress information during optimization, including iteration number, negative log-likelihood, gradient norm, and step adjustment details. Default: `FALSE`.
`optimization_method`	Character string specifying the optimization method: "trust-region" (default), "newton-raphson", or "hybrid" (starts with trust-region, switches to newton-raphson near convergence).
`enforce_bounds`	Logical; if `TRUE`, parameter values are constrained to stay within `min_param_val`, `max_param_val` (and `\delta \ge 0`) during optimization. Default: `TRUE`.
`min_param_val`	Minimum allowed value for parameters constrained to be strictly positive (`\alpha, \beta, \gamma, \lambda`). Default: `1e-5`.
`max_param_val`	Maximum allowed value for all parameters. Default: `1e5`.
`adaptive_scaling`	Logical; if `TRUE`, parameters are automatically scaled to improve numerical stability. Default: `TRUE`.
`use_stochastic_perturbation`	Logical; if `TRUE`, applies random perturbations when optimization stalls. Default: `TRUE`.
`get_num_hess`	Logical; if `TRUE`, computes and returns a numerical approximation of the Hessian at the solution. Default: `FALSE`.
`multi_start_attempts`	Integer specifying the number of different starting points to try if initial optimization fails to converge. Default: `3`.
`eigenvalue_hessian_reg`	Logical; if `TRUE`, uses eigenvalue-based regularization for the Hessian matrix. Default: `TRUE`.
`max_backtrack`	Integer specifying the maximum number of backtracking steps in line search. Default: `20`.
`initial_trust_radius`	Initial radius for trust region method. Default: `1.0`.

Details

This enhanced algorithm provides robust parameter estimation for the Generalized Kumaraswamy distribution and its subfamilies. The function implements several advanced numerical optimization techniques to maximize the likelihood function reliably even in difficult cases.

The GKw Family of Distributions

The Generalized Kumaraswamy (GKw) distribution, introduced by Carrasco, Ferrari, and Cordeiro (2010), is a flexible five-parameter continuous distribution defined on the standard unit interval (0,1). Its probability density function is given by:

f(x; \alpha, \beta, \gamma, \delta, \lambda) = \frac{\lambda\alpha\beta x^{\alpha-1}}{B(\gamma, \delta+1)}(1-x^{\alpha})^{\beta-1}[1-(1-x^{\alpha})^{\beta}]^{\gamma\lambda-1}\{1-[1-(1-x^{\alpha})^{\beta}]^{\lambda}\}^{\delta}

where \alpha, \beta, \gamma, \lambda > 0 and \delta \geq 0 are the model parameters, and B(\gamma, \delta+1) is the beta function.

The GKw distribution encompasses several important special cases:

GKw (5 parameters): \alpha, \beta, \gamma, \delta, \lambda
BKw (4 parameters): \alpha, \beta, \gamma, \delta (with \lambda = 1)
KKw (4 parameters): \alpha, \beta, \delta, \lambda (with \gamma = 1)
EKw (3 parameters): \alpha, \beta, \lambda (with \gamma = 1, \delta = 0)
Mc (3 parameters): \gamma, \delta, \lambda (with \alpha = 1, \beta = 1)
Kw (2 parameters): \alpha, \beta (with \gamma = 1, \delta = 0, \lambda = 1)
Beta(2 parameters): \gamma, \delta (with \alpha = 1, \beta = 1, \lambda = 1)

Trust Region Method with Levenberg-Marquardt Algorithm

The trust region approach restricts parameter updates to a region where the quadratic approximation of the objective function is trusted to be accurate. This algorithm implements the Levenberg-Marquardt variant, which solves the subproblem:

\min_p m_k(p) = -\nabla \ell(\theta_k)^T p + \frac{1}{2}p^T H_k p

\text{subject to } \|p\| \leq \Delta_k

where \nabla \ell(\theta_k) is the gradient, H_k is the Hessian, and \Delta_k is the trust region radius at iteration k.

The Levenberg-Marquardt approach adds a regularization parameter \lambda to the Hessian, solving:

(H_k + \lambda I)p = -\nabla \ell(\theta_k)

The parameter \lambda controls the step size and direction:

When \lambda is large, the step approaches a scaled steepest descent direction.
When \lambda is small, the step approaches the Newton direction.

The algorithm dynamically adjusts \lambda based on the agreement between the quadratic model and the actual function:

\rho_k = \frac{f(\theta_k) - f(\theta_k + p_k)}{m_k(0) - m_k(p_k)}

The trust region radius is updated according to:

If \rho_k < 0.25, reduce the radius: \Delta_{k+1} = 0.5\Delta_k
If \rho_k > 0.75 and \|p_k\| \approx \Delta_k, increase the radius: \Delta_{k+1} = 2\Delta_k
Otherwise, leave the radius unchanged: \Delta_{k+1} = \Delta_k

The step is accepted if \rho_k > \eta (typically \eta = 0.1).

Eigenvalue-Based Hessian Regularization

For the Newton-Raphson method to converge, the Hessian matrix must be positive definite. This algorithm uses eigendecomposition to create a positive definite approximation that preserves the Hessian's structure:

H = Q\Lambda Q^T

where Q contains the eigenvectors and \Lambda is a diagonal matrix of eigenvalues.

The regularized Hessian is constructed by:

\tilde{H} = Q\tilde{\Lambda}Q^T

where \tilde{\Lambda} contains modified eigenvalues:

\tilde{\lambda}_i = \max(\lambda_i, \epsilon)

with \epsilon being a small positive threshold (typically 10^{-6}).

This approach is superior to diagonal loading (H + \lambda I) as it:

Preserves the eigenvector structure of the original Hessian
Minimally modifies the eigenvalues needed to ensure positive definiteness
Better maintains the directional information in the Hessian

Parameter Scaling for Numerical Stability

When parameters have widely different magnitudes, optimization can become numerically unstable. The adaptive scaling system transforms the parameter space to improve conditioning:

\theta_i^{scaled} = s_i \theta_i

where scaling factors s_i are determined by:

For large parameters (|\theta_i| > 100): s_i = 100/|\theta_i|
For small parameters (0 < |\theta_i| < 0.01): s_i = 0.01/|\theta_i|
Otherwise: s_i = 1

The optimization is performed in the scaled space, with appropriate transformations for the gradient and Hessian:

\nabla \ell(\theta^{scaled})_i = \frac{1}{s_i}\nabla \ell(\theta)_i

H(\theta^{scaled})_{ij} = \frac{1}{s_i s_j}H(\theta)_{ij}

The final results are back-transformed to the original parameter space before being returned.

Line Search with Wolfe Conditions

The line search procedure ensures sufficient decrease in the objective function when taking a step in the search direction. The implementation uses Wolfe conditions which include both:

Sufficient decrease (Armijo) condition:

f(\theta_k + \alpha p_k) \leq f(\theta_k) + c_1 \alpha \nabla f(\theta_k)^T p_k
Curvature condition:

|\nabla f(\theta_k + \alpha p_k)^T p_k| \leq c_2 |\nabla f(\theta_k)^T p_k|

where 0 < c_1 < c_2 < 1, typically c_1 = 10^{-4} and c_2 = 0.9.

The step length \alpha is determined using polynomial interpolation:

First iteration: quadratic interpolation
Subsequent iterations: cubic interpolation using function and derivative values

The cubic polynomial model has the form:

a\alpha^3 + b\alpha^2 + c\alpha + d

The algorithm computes coefficients from values at two points, then finds the minimizer of this polynomial subject to bounds to ensure convergence.

Adaptive Numerical Differentiation

When analytical derivatives are unreliable, the algorithm uses numerical differentiation with adaptive step sizes based on parameter magnitudes:

h_i = \max(h_{min}, \min(h_{base}, h_{base} \cdot |\theta_i|))

where h_{min} is a minimum step size (typically 10^{-8}), h_{base} is a base step size (typically 10^{-5}), and \theta_i is the parameter value.

For computing diagonal Hessian elements, the central difference formula is used:

\frac{\partial^2 f}{\partial \theta_i^2} \approx \frac{f(\theta + h_i e_i) - 2f(\theta) + f(\theta - h_i e_i)}{h_i^2}

For mixed partial derivatives:

\frac{\partial^2 f}{\partial \theta_i \partial \theta_j} \approx \frac{f(\theta + h_i e_i + h_j e_j) - f(\theta + h_i e_i - h_j e_j) - f(\theta - h_i e_i + h_j e_j) + f(\theta - h_i e_i - h_j e_j)}{4h_i h_j}

The algorithm validates numerical differentiation by comparing with existing gradients and adaptively adjusts step sizes when discrepancies are detected.

Stochastic Perturbation

To escape flat regions or local minima, the algorithm implements controlled stochastic perturbation when progress stalls (detected by monitoring gradient norm changes):

\theta_i^{new} = \theta_i + \Delta\theta_i

where the perturbation \Delta\theta_i combines:

A directed component opposite to the gradient: -\text{sign}(\nabla \ell_i) \cdot 0.05 \cdot |\theta_i|
A random noise component: U(-0.05|\theta_i|, 0.05|\theta_i|)

The perturbation is applied when:

The relative change in gradient norm is below a threshold for several consecutive iterations
The algorithm appears to be stuck in a non-optimal region

The perturbation is accepted only if it improves the objective function value.

Multi-Start Strategy

For particularly challenging optimization landscapes, the algorithm can employ multiple starting points:

Initial values are generated using moment-based estimation and domain knowledge about each distribution family
Each initial point is randomly perturbed to explore different regions of the parameter space
Independent optimization runs are performed from each starting point
The best result (based on likelihood value and convergence status) is returned

This approach increases the probability of finding the global optimum or a high-quality local optimum, particularly for complex models with many parameters.

Advanced Parameter Initialization

Intelligent starting values are critical for convergence in complex models. The algorithm uses data-driven initialization based on:

Method of moments estimators for beta parameters:

\alpha + \beta = \frac{\bar{x}(1-\bar{x})}{s^2} - 1

\alpha = (\alpha + \beta)\bar{x}
Transformations to Kumaraswamy parameters:

a_{Kw} = \sqrt{\alpha_{Beta}}

b_{Kw} = \sqrt{\beta_{Beta}}
Adjustments based on data skewness (detected via mean relative to 0.5)
Corrections based on data dispersion (range relative to (0,1) interval)

The transformations between beta and Kumaraswamy parameters leverage the similarities between these distributions while accounting for their parametric differences.

Hybrid Optimization Strategy

The algorithm can dynamically switch between trust region and Newton-Raphson methods based on optimization progress:

Early iterations: trust region method for stability and global convergence properties
Later iterations (when close to optimum): Newton-Raphson with line search for quadratic convergence rate

The switching criteria are based on iteration count and gradient norm, with additional logic to handle cases where one method consistently fails.

Value

A list object of class gkw_fit containing the following components:

`parameters`	A named numeric vector with the estimated parameters.
`loglik`	The maximized value of the log-likelihood function.
`iterations`	Number of iterations performed.
`converged`	Logical flag indicating whether the algorithm converged successfully.
`param_history`	A matrix where rows represent iterations and columns represent parameter values.
`loglik_history`	A vector of log-likelihood values at each iteration.
`gradient`	The gradient vector at the final parameter estimates.
`hessian`	The analytical Hessian matrix at the final parameter estimates.
`std_errors`	A named numeric vector of approximate standard errors for the parameters.
`aic`	Akaike Information Criterion.
`bic`	Bayesian Information Criterion.
`aicc`	AIC with small sample correction.
`n`	The sample size.
`status`	A character string indicating the termination status.
`z_values`	A named numeric vector of Z-statistics for parameter significance testing.
`p_values`	A named numeric vector of two-sided p-values corresponding to the Z-statistics.
`param_names`	A character vector of the parameter names.
`family`	The distribution family used.
`optimization_method`	The optimization method used.
`numeric_hessian`	The numerically approximated Hessian at the solution (if requested).
`condition_number`	The condition number of the final Hessian, a measure of parameter identifiability.
`scaling_factors`	The scaling factors used for parameters (if adaptive scaling was enabled).

Warning

Although this implementation is highly robust, fitting complex distributions can still be challenging. For best results:

Try multiple starting values if results seem suboptimal
Examine diagnostic information carefully, especially condition numbers and standard errors
Be cautious of parameter estimates at or very near boundaries
Consider model simplification if convergence is consistently problematic
For the full GKw model with 5 parameters, convergence may be sensitive to starting values
High condition numbers (>1e6) may indicate parameter redundancy or weak identifiability

Author(s)

Enhanced by Lopes, J. E.

References

Carrasco, J. M. F., Ferrari, S. L. P., & Cordeiro, G. M. (2010). A new generalized Kumaraswamy distribution. arXiv preprint arXiv:1004.0911.

Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.

Conn, A. R., Gould, N. I. M., & Toint, P. L. (2000). Trust Region Methods. MPS-SIAM Series on Optimization.

Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.

Examples


# Generate sample data from a Beta(2,5) distribution for testing
set.seed(123)
sample_data <- rbeta_(200, 2, 5)

# Automatic initialization (recommended for beginners)
result_auto <- nrgkw(NULL, sample_data, family = "beta", verbose = FALSE)
print(result_auto$parameters)
print(result_auto$loglik)

# Compare different optimization methods
methods <- c("trust-region", "newton-raphson", "hybrid")
results <- list()

for (method in methods) {
  results[[method]] <- nrgkw(NULL, sample_data, family = "beta",
                               optimization_method = method)
  cat(sprintf("Method: %s, AIC: %.4f\n", method, results[[method]]$aic))
}

# Fit the full GKw model with diagnostic information
gkw_result <- nrgkw(NULL, sample_data, family = "gkw",
                      verbose = FALSE, get_num_hess = TRUE)

# Examine parameter identifiability through condition number
cat(sprintf("Condition number: %.2e\n", gkw_result$condition_number))
print(gkw_result)

# Compare with simpler models using information criteria
cat("Information criteria comparison:\n")
cat(sprintf("GKw: AIC=%.4f, BIC=%.4f\n", gkw_result$aic, gkw_result$bic))
cat(sprintf("Beta: AIC=%.4f, BIC=%.4f\n",
           results[["trust-region"]]$aic, results[["trust-region"]]$bic))

gkwreg documentation built on April 16, 2025, 1:10 a.m.

gkwreg index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gkwreg
Generalized Kumaraswamy Regression Models for Bounded Data

nrgkw: Enhanced Newton-Raphson Optimization for GKw Family...
In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data

Enhanced Newton-Raphson Optimization for GKw Family Distributions

Description

Usage

Arguments

Details

The GKw Family of Distributions

Trust Region Method with Levenberg-Marquardt Algorithm

Eigenvalue-Based Hessian Regularization

Parameter Scaling for Numerical Stability

Line Search with Wolfe Conditions

Adaptive Numerical Differentiation

Stochastic Perturbation

Multi-Start Strategy

Advanced Parameter Initialization

Hybrid Optimization Strategy

Value

Warning

Author(s)

References

Examples

Related to nrgkw in gkwreg...

R Package Documentation

Browse R Packages

We want your feedback!

gkwreg Generalized Kumaraswamy Regression Models for Bounded Data

nrgkw: Enhanced Newton-Raphson Optimization for GKw Family... In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data

Enhanced Newton-Raphson Optimization for GKw Family Distributions

Description

Usage

Arguments

Details

The GKw Family of Distributions

Trust Region Method with Levenberg-Marquardt Algorithm

Eigenvalue-Based Hessian Regularization

Parameter Scaling for Numerical Stability

Line Search with Wolfe Conditions

Adaptive Numerical Differentiation

Stochastic Perturbation

Multi-Start Strategy

Advanced Parameter Initialization

Hybrid Optimization Strategy

Value

Warning

Author(s)

References

Examples

Related to nrgkw in gkwreg...

R Package Documentation

Browse R Packages

We want your feedback!

gkwreg
Generalized Kumaraswamy Regression Models for Bounded Data

nrgkw: Enhanced Newton-Raphson Optimization for GKw Family...
In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data