gkwgof: Comprehensive Goodness-of-Fit Analysis for GKw Family...
In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data

gkwgof

R Documentation

Comprehensive Goodness-of-Fit Analysis for GKw Family Distributions

Description

Computes and displays a comprehensive set of goodness-of-fit statistics for distributions from the Generalized Kumaraswamy (GKw) family fitted using gkwfit. This function provides various measures including distance-based tests, information criteria, moment comparisons, probability plot metrics, and additional visualization tools for model adequacy assessment.

Usage

gkwgof(
  object,
  simulate_p_values = FALSE,
  n_bootstrap = 1000,
  plot = TRUE,
  print_summary = TRUE,
  verbose = FALSE,
  theme = ggplot2::theme_bw(),
  ncols = 4,
  title = NULL,
  ...
)

Arguments

`object`	An object of class "gkwfit", typically the result of a call to `gkwfit`.
`simulate_p_values`	Logical; if TRUE, uses parametric bootstrap to compute approximate p-values for distance-based tests. Default is FALSE.
`n_bootstrap`	Number of bootstrap replicates for p-value simulation. Only used if `simulate_p_values = TRUE`. Default is 1000.
`plot`	Logical; if TRUE, generates additional diagnostic plots beyond those already in the gkwfit object. Default is TRUE.
`print_summary`	Logical; if TRUE, prints a formatted summary of the goodness-of-fit statistics. Default is TRUE.
`verbose`	Logical; if TRUE, provides additional details and explanations about the test statistics. Default is FALSE.
`theme`	A ggplot theme for all plots. default ggplot2::theme_bw()
`ncols`	Number of columns to draw plots in graphics window.
`title`	Plot title.
`...`	Additional arguments to be passed to plotting functions.

Details

This function calculates the following goodness-of-fit statistics:

Distance-based tests:

Kolmogorov-Smirnov (KS) statistic: Measures the maximum absolute difference between the empirical and theoretical CDFs.
Cramer-von Mises (CvM) statistic: Measures the integrated squared difference between the empirical and theoretical CDFs.
Anderson-Darling (AD) statistic: Similar to CvM but places more weight on the tails of the distribution.
Watson (W^2) statistic: A modification of CvM that is location-invariant on the circle.

Information criteria:

Akaike Information Criterion (AIC): -2\log(L) + 2k
Bayesian Information Criterion (BIC): -2\log(L) + k\log(n)
Corrected AIC (AICc): AIC + \frac{2k(k+1)}{n-k-1}
Consistent AIC (CAIC): -2\log(L) + k(\log(n) + 1)
Hannan-Quinn IC (HQIC): -2\log(L) + 2k\log(\log(n))

Moment-based comparisons:

Theoretical vs. sample mean, variance, skewness, and kurtosis
Standardized moment differences (relative to sample standard deviation)
Root mean squared error of moments (RMSE)

Probability plot metrics:

Correlation coefficient from P-P plot (closer to 1 indicates better fit)
Area between P-P curve and diagonal line
Mean absolute error in Q-Q plot

Likelihood statistics:

Log-likelihood
Log-likelihood per observation
Pseudo-R^2 measure for bounded distributions

Prediction accuracy metrics:

Mean Absolute Error (MAE) between empirical and theoretical CDF
Root Mean Squared Error (RMSE) between empirical and theoretical CDF
Continuous Ranked Probability Score (CRPS)

For model selection, lower values of information criteria (AIC, BIC, etc.) indicate better fit, while higher values of correlation coefficients and pseudo-R^2 indicate better fit. The distance-based tests are primarily used for hypothesis testing rather than model selection.

When simulate_p_values = TRUE, the function performs parametric bootstrap to compute approximate p-values for the distance-based tests, which accounts for parameter estimation uncertainty.

Value

An object of class "gkwgof" (inheriting from "list") containing the following components:

family: The fitted distribution family
coefficients: The estimated model parameters
sample_size: The number of observations used in fitting
distance_tests: Results from KS, CvM, AD, and Watson tests
information_criteria: AIC, BIC, AICc, CAIC, and HQIC values
moments: Theoretical moments, sample moments, and their differences
probability_plots: Metrics from P-P and Q-Q plots
likelihood: Log-likelihood statistics and pseudo-R^2 measure
prediction: Prediction accuracy metrics
plots: Additional diagnostic plots (if plot = TRUE)
p_values: Simulated p-values (if simulate_p_values = TRUE)
bootstrap_stats: Bootstrap distribution of test statistics (if simulate_p_values = TRUE)
call: The matched call
gkwfit_object: The original gkwfit object used for analysis

References

Anderson, T. W., & Darling, D. A. (1952). Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Annals of Mathematical Statistics, 23(2), 193-212.

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). Springer.

D'Agostino, R. B., & Stephens, M. A. (1986). Goodness-of-fit techniques. Marcel Dekker, Inc.

Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.

Examples


# Example 1: Simulate and analyze data from a Kumaraswamy (Kw) distribution
set.seed(2203) # Set seed for reproducibility
# Simulate 1000 observations from Kumaraswamy distribution with parameters alpha=2.5, beta=1.8
data_kw <- rkw(n = 1000, alpha = 2.5, beta = 1.8)
# Fit the Kumaraswamy distribution to the data
fit_kw <- gkwfit(data_kw, family = "kw")
# Basic goodness-of-fit analysis
gof_kw <- gkwgof(fit_kw)
# Analysis with bootstrap simulated p-values
gof_kw_bootstrap <- gkwgof(fit_kw, simulate_p_values = TRUE, n_bootstrap = 500)
# Detailed summary with additional explanations
gof_kw_verbose <- gkwgof(fit_kw, verbose = TRUE)

# Example 2: Comparing different distributions from the GKw family
# Simulate data from the Generalized Kumaraswamy (GKw) distribution
data_gkw <- rgkw(n = 1000, alpha = 2.0, beta = 1.5, gamma = 3.0, delta = 2.0, lambda = 1.2)
# Fit different models to the same data
fit_kw <- gkwfit(data_gkw, family = "kw") # Kumaraswamy model (simplified)
fit_bkw <- gkwfit(data_gkw, family = "bkw") # Beta-Kumaraswamy model
fit_ekw <- gkwfit(data_gkw, family = "ekw") # Exponentiated Kumaraswamy model
fit_gkw <- gkwfit(data_gkw, family = "gkw") # Full Generalized Kumaraswamy model
fit_beta <- gkwfit(data_gkw, family = "beta") # Standard Beta model
# Goodness-of-fit analysis for each model (without printing summaries)
gof_kw <- gkwgof(fit_kw, print_summary = FALSE)
gof_bkw <- gkwgof(fit_bkw, print_summary = FALSE)
gof_ekw <- gkwgof(fit_ekw, print_summary = FALSE)
gof_gkw <- gkwgof(fit_gkw, print_summary = FALSE)
gof_beta <- gkwgof(fit_beta, print_summary = FALSE)
# Information criteria comparison for model selection
ic_comparison <- data.frame(
  family = c("kw", "bkw", "ekw", "gkw", "beta"),
  n_params = c(
    length(gof_kw$coefficients),
    length(gof_bkw$coefficients),
    length(gof_ekw$coefficients),
    length(gof_gkw$coefficients),
    length(gof_beta$coefficients)
  ),
  logLik = c(
    gof_kw$likelihood$loglik,
    gof_bkw$likelihood$loglik,
    gof_ekw$likelihood$loglik,
    gof_gkw$likelihood$loglik,
    gof_beta$likelihood$loglik
  ),
  AIC = c(
    gof_kw$information_criteria$AIC,
    gof_bkw$information_criteria$AIC,
    gof_ekw$information_criteria$AIC,
    gof_gkw$information_criteria$AIC,
    gof_beta$information_criteria$AIC
  ),
  BIC = c(
    gof_kw$information_criteria$BIC,
    gof_bkw$information_criteria$BIC,
    gof_ekw$information_criteria$BIC,
    gof_gkw$information_criteria$BIC,
    gof_beta$information_criteria$BIC
  ),
  AICc = c(
    gof_kw$information_criteria$AICc,
    gof_bkw$information_criteria$AICc,
    gof_ekw$information_criteria$AICc,
    gof_gkw$information_criteria$AICc,
    gof_beta$information_criteria$AICc
  )
)
# Sort by AIC (lower is better)
ic_comparison <- ic_comparison[order(ic_comparison$AIC), ]
print(ic_comparison)

# Example 3: Comparative visualization
# Generate data from Beta distribution to demonstrate another case
set.seed(2203)
data_beta <- rbeta_(n = 1000, gamma = 2.5, delta = 1.5)
# Fit different distributions
fit_beta_true <- gkwfit(data_beta, family = "beta")
fit_kw_misspec <- gkwfit(data_beta, family = "kw")
fit_gkw_complex <- gkwfit(data_beta, family = "gkw")
# Goodness-of-fit analysis
gof_beta_true <- gkwgof(fit_beta_true, print_summary = FALSE, plot = FALSE)
gof_kw_misspec <- gkwgof(fit_kw_misspec, print_summary = FALSE, plot = FALSE)
gof_gkw_complex <- gkwgof(fit_gkw_complex, print_summary = FALSE, plot = FALSE)
# Comparative goodness-of-fit plot

plotcompare(
  list(
    "Beta (correct)" = gof_beta_true,
    "Kw (underspecified)" = gof_kw_misspec,
    "GKw (overspecified)" = gof_gkw_complex
  ),
  title = "Comparison of fits for Beta data"
)

# Example 5: Likelihood ratio tests for nested models
# Testing if the GKw distribution is significantly better than BKw (lambda = 1)
# Null hypothesis: lambda = 1 (BKw is adequate)
# Alternative hypothesis: lambda != 1 (GKw is necessary)
# Fitting nested models to data
nested_data <- rgkw(n = 1000, alpha = 2.0, beta = 1.5, gamma = 3.0, delta = 2.0, lambda = 1.0)
nested_fit_bkw <- gkwfit(nested_data, family = "bkw") # Restricted model (lambda = 1)
nested_fit_gkw <- gkwfit(nested_data, family = "gkw") # Unrestricted model

# Extracting log-likelihoods
ll_bkw <- nested_fit_bkw$loglik
ll_gkw <- nested_fit_gkw$loglik
# Calculating likelihood ratio test statistic
lr_stat <- 2 * (ll_gkw - ll_bkw)
# Calculating p-value (chi-square with 1 degree of freedom)
lr_pvalue <- 1 - pchisq(lr_stat, df = 1)
# Displaying test results
cat("Likelihood ratio test:\n")
cat("Test statistic:", round(lr_stat, 4), "\n")
cat("P-value:", format.pval(lr_pvalue), "\n")
cat("Conclusion:", ifelse(lr_pvalue < 0.05,
  "Reject H0 - GKw is necessary",
  "Fail to reject H0 - BKw is adequate"
), "\n")

# Example 6: Power simulation
# Checking the power of goodness-of-fit tests in detecting misspecifications
# Function to simulate power
simulate_power <- function(n_sim = 1000, n_obs = 1000, alpha_level = 0.05) {
  # Counters for rejections
  ks_rejections <- 0
  ad_rejections <- 0

  for (i in 1:n_sim) {
    # Simulating data from GKw, but fitting a Kw model (incorrect)
    sim_data <- rgkw(
      n = n_obs, alpha = 2.0, beta = 1.5,
      gamma = 3.0, delta = 2.0, lambda = 1.5
    )

    # Fitting incorrect model
    sim_fit_kw <- gkwfit(sim_data, family = "kw")

    # Calculating test statistics
    sim_gof <- gkwgof(sim_fit_kw,
      simulate_p_values = TRUE,
      n_bootstrap = 200, print_summary = FALSE, plot = FALSE
    )

    # Checking rejections in tests
    if (sim_gof$p_values$ks < alpha_level) ks_rejections <- ks_rejections + 1
    if (sim_gof$p_values$ad < alpha_level) ad_rejections <- ad_rejections + 1
  }

  # Calculating power
  power_ks <- ks_rejections / n_sim
  power_ad <- ad_rejections / n_sim

  return(list(
    power_ks = power_ks,
    power_ad = power_ad
  ))
}

# Run power simulation with reduced number of repetitions for example
power_results <- simulate_power(n_sim = 100)
cat("Estimated power (KS):", round(power_results$power_ks, 2), "\n")
cat("Estimated power (AD):", round(power_results$power_ad, 2), "\n")

gkwreg documentation built on April 16, 2025, 1:10 a.m.