gkwreg: Fit Generalized Kumaraswamy Regression Models
In gkwreg: Generalized Kumaraswamy Regression Models for Bounded Data

gkwreg

R Documentation

Fit Generalized Kumaraswamy Regression Models

Description

Fits regression models using the Generalized Kumaraswamy (GKw) family of distributions for response variables strictly bounded in the interval (0, 1). The function allows modeling parameters from all seven submodels of the GKw family as functions of predictors using appropriate link functions. Estimation is performed using Maximum Likelihood via the TMB (Template Model Builder) package. Requires the Formula and TMB packages.

Usage

gkwreg(
  formula,
  data,
  family = c("gkw", "bkw", "kkw", "ekw", "mc", "kw", "beta"),
  link = NULL,
  start = NULL,
  fixed = NULL,
  method = c("nlminb", "BFGS", "Nelder-Mead", "CG", "SANN", "L-BFGS-B"),
  hessian = TRUE,
  plot = TRUE,
  conf.level = 0.95,
  optimizer.control = list(),
  subset = NULL,
  weights = NULL,
  offset = NULL,
  na.action = getOption("na.action"),
  contrasts = NULL,
  x = FALSE,
  y = TRUE,
  model = TRUE,
  silent = TRUE,
  ...
)

Arguments

`formula`	An object of class `Formula` (or one that can be coerced to that class). It should be structured as `y ~ model_alpha \| model_beta \| model_gamma \| model_delta \| model_lambda`, where `y` is the response variable and each `model_*` part specifies the linear predictor for the corresponding parameter (`\alpha`, `\beta`, `\gamma`, `\delta`, `\lambda`). If a part is omitted or specified as `~ 1` or `.`, an intercept-only model is used for that parameter. See Details for parameter correspondence in subfamilies.
`data`	A data frame containing the variables specified in the `formula`.
`family`	A character string specifying the desired distribution family. Defaults to `"gkw"`. Supported families are: `"gkw"`: Generalized Kumaraswamy (5 parameters: `\alpha, \beta, \gamma, \delta, \lambda`) `"bkw"`: Beta-Kumaraswamy (4 parameters: `\alpha, \beta, \gamma, \delta`; `\lambda = 1` fixed) `"kkw"`: Kumaraswamy-Kumaraswamy (4 parameters: `\alpha, \beta, \delta, \lambda`; `\gamma = 1` fixed) `"ekw"`: Exponentiated Kumaraswamy (3 parameters: `\alpha, \beta, \lambda`; `\gamma = 1, \delta = 0` fixed) `"mc"`: McDonald / Beta Power (3 parameters: `\gamma, \delta, \lambda`; `\alpha = 1, \beta = 1` fixed) `"kw"`: Kumaraswamy (2 parameters: `\alpha, \beta`; `\gamma = 1, \delta = 0, \lambda = 1` fixed) `"beta"`: Beta distribution (2 parameters: `\gamma, \delta`; `\alpha = 1, \beta = 1, \lambda = 1` fixed)
`link`	Either a single character string specifying the same link function for all relevant parameters, or a named list specifying the link function for each modeled parameter (e.g., `list(alpha = "log", beta = "log", delta = "logit")`). Defaults are `"log"` for `\alpha, \beta, \gamma, \lambda` (parameters > 0) and `"logit"` for `\delta` (parameter in (0, 1)). Supported link functions are: `"log"` `"logit"` `"identity"` `"inverse"` (i.e., 1/mu) `"sqrt"` `"probit"` `"cloglog"` (complementary log-log)
`start`	An optional named list providing initial values for the regression coefficients. Parameter names should match the distribution parameters (alpha, beta, etc.), and values should be vectors corresponding to the coefficients in the respective linear predictors (including intercept). If `NULL` (default), suitable starting values are automatically determined based on global parameter estimates.
`fixed`	An optional named list specifying parameters or coefficients to be held fixed at specific values during estimation. Currently not fully implemented.
`method`	Character string specifying the optimization algorithm to use. Options are `"nlminb"` (default, using `nlminb`), `"BFGS"`, `"Nelder-Mead"`, `"CG"`, `"SANN"`, or `"L-BFGS-B"`. If `"nlminb"` is selected, R's `nlminb` function is used; otherwise, R's `optim` function is used with the specified method.
`hessian`	Logical. If `TRUE` (default), the Hessian matrix is computed via `sdreport` to obtain standard errors and the covariance matrix of the estimated coefficients. Setting to `FALSE` speeds up fitting but prevents calculation of standard errors and confidence intervals.
`plot`	Logical. If `TRUE` (default), enables the generation of diagnostic plots when calling the generic `plot()` function on the fitted object. Actual plotting is handled by the `plot.gkwreg` method.
`conf.level`	Numeric. The confidence level (1 - alpha) for constructing confidence intervals for the parameters. Default is 0.95. Used only if `hessian = TRUE`.
`optimizer.control`	A list of control parameters passed directly to the chosen optimizer (`nlminb` or `optim`). See their respective documentation for details.
`subset`	An optional vector specifying a subset of observations from `data` to be used in the fitting process.
`weights`	An optional vector of prior weights (e.g., frequency weights) to be used in the fitting process. Should be `NULL` or a numeric vector of non-negative values.
`offset`	An optional numeric vector or matrix specifying an a priori known component to be included on the scale of the linear predictor for each parameter. If a vector, it's applied to the predictor of the first parameter in the standard order (`\alpha`). If a matrix, columns must correspond to parameters in the order `\alpha, \beta, \gamma, \delta, \lambda`.
`na.action`	A function which indicates what should happen when the data contain `NA`s. The default (`na.fail`) stops if `NA`s are present. Other options include `na.omit` or `na.exclude`.
`contrasts`	An optional list specifying the contrasts to be used for factor variables in the model. See the `contrasts.arg` of `model.matrix.default`.
`x`	Logical. If `TRUE`, the list of model matrices (one for each modeled parameter) is returned as component `x` of the fitted object. Default `FALSE`.
`y`	Logical. If `TRUE` (default), the response variable (after processing by `na.action`, `subset`) is returned as component `y`.
`model`	Logical. If `TRUE` (default), the model frame (containing all variables used from `data`) is returned as component `model`.
`silent`	Logical. If `TRUE` (default), suppresses progress messages from TMB compilation and optimization. Set to `FALSE` for verbose output.
`...`	Additional arguments, currently unused or passed down to internal methods (potentially).

Details

The gkwreg function provides a regression framework for the Generalized Kumaraswamy (GKw) family and its submodels, extending density estimation to include covariates. The response variable must be strictly bounded in the (0, 1) interval.

Model Specification: The extended Formula syntax is crucial for specifying potentially different linear predictors for each relevant distribution parameter. The parameters (\alpha, \beta, \gamma, \delta, \lambda) correspond sequentially to the parts of the formula separated by |. For subfamilies where some parameters are fixed by definition (see family argument), the corresponding parts of the formula are automatically ignored. For example, in a family = "kw" model, only the first two parts (for \alpha and \beta) are relevant.

Parameter Constraints and Link Functions: The parameters \alpha, \beta, \gamma, \lambda are constrained to be positive, while \delta is constrained to the interval (0, 1). The default link functions ("log" for positive parameters, "logit" for \delta) ensure these constraints during estimation. Users can specify alternative link functions suitable for the parameter's domain via the link argument.

Families and Parameters: The function automatically handles parameter fixing based on the chosen family:

GKw: All 5 parameters (\alpha, \beta, \gamma, \delta, \lambda) modeled.
BKw: Models \alpha, \beta, \gamma, \delta; fixes \lambda = 1.
KKw: Models \alpha, \beta, \delta, \lambda; fixes \gamma = 1.
EKw: Models \alpha, \beta, \lambda; fixes \gamma = 1, \delta = 0.
Mc (McDonald): Models \gamma, \delta, \lambda; fixes \alpha = 1, \beta = 1.
Kw (Kumaraswamy): Models \alpha, \beta; fixes \gamma = 1, \delta = 0, \lambda = 1.
Beta: Models \gamma, \delta; fixes \alpha = 1, \beta = 1, \lambda = 1. This parameterization corresponds to the standard Beta distribution with shape1 = \gamma and shape2 = \delta.

Estimation Engine: Maximum Likelihood Estimation (MLE) is performed using C++ templates via the TMB package, which provides automatic differentiation and efficient optimization capabilities. The specific TMB template used depends on the chosen family.

Optimizer Method (method argument):

"nlminb": Uses R's built-in stats::nlminb optimizer. Good for problems with box constraints. Default option.
"Nelder-Mead": Uses R's stats::optim with the Nelder-Mead simplex algorithm, which doesn't require derivatives.
"BFGS": Uses R's stats::optim with the BFGS quasi-Newton method for unconstrained optimization.
"CG": Uses R's stats::optim with conjugate gradients method for unconstrained optimization.
"SANN": Uses R's stats::optim with simulated annealing, a global optimization method useful for problems with multiple local minima.
"L-BFGS-B": Uses R's stats::optim with the limited-memory BFGS method with box constraints.

Value

An object of class gkwreg. This is a list containing the following components:

`call`	The matched function call.
`family`	The specified distribution family string.
`formula`	The `Formula` object used.
`coefficients`	A named vector of estimated regression coefficients.
`fitted.values`	Vector of estimated means (expected values) of the response.
`residuals`	Vector of response residuals (observed - fitted mean).
`fitted_parameters`	List containing the estimated mean value for each distribution parameter (`\alpha, \beta, \gamma, \delta, \lambda`).
`parameter_vectors`	List containing vectors of the estimated parameters (`\alpha, \beta, \gamma, \delta, \lambda`) for each observation, evaluated on the link scale.
`link`	List of link functions used for each parameter.
`param_names`	Character vector of names of the parameters modeled by the family.
`fixed_params`	Named list indicating which parameters are fixed by the family definition.
`loglik`	The maximized log-likelihood value.
`aic`	Akaike Information Criterion.
`bic`	Bayesian Information Criterion.
`deviance`	The deviance ( -2 * loglik ).
`df.residual`	Residual degrees of freedom (nobs - npar).
`nobs`	Number of observations used in the fit.
`npar`	Total number of estimated parameters (coefficients).
`vcov`	The variance-covariance matrix of the coefficients (if `hessian = TRUE`).
`se`	Standard errors of the coefficients (if `hessian = TRUE`).
`convergence`	Convergence code from the optimizer (0 typically indicates success).
`message`	Convergence message from the optimizer.
`iterations`	Number of iterations used by the optimizer.
`rmse`	Root Mean Squared Error of response residuals.
`efron_r2`	Efron's pseudo R-squared.
`mean_absolute_error`	Mean Absolute Error of response residuals.
`x`	List of model matrices (if `x = TRUE`).
`y`	The response vector (if `y = TRUE`).
`model`	The model frame (if `model = TRUE`).
`tmb_object`	The raw object returned by `MakeADFun`.

Author(s)

Lopes, J. E.

References

Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.

Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883-898.

Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.

Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21. (Underlying TMB package)

Zeileis, A., Kleiber, C., Jackman, S. (2008). Regression Models for Count Data in R. Journal of Statistical Software, 27(8), 1-25.

Smithson, M., & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.

Examples


## Example 1: Simple Kumaraswamy regression model ----
set.seed(123)
n <- 1000
x1 <- runif(n, -2, 2)
x2 <- rnorm(n)

# True regression coefficients
alpha_coef <- c(0.8, 0.3, -0.2) # Intercept, x1, x2
beta_coef <- c(1.2, -0.4, 0.1) # Intercept, x1, x2

# Generate linear predictors and transform to parameters using inverse link (exp)
eta_alpha <- alpha_coef[1] + alpha_coef[2] * x1 + alpha_coef[3] * x2
eta_beta <- beta_coef[1] + beta_coef[2] * x1 + beta_coef[3] * x2
alpha_true <- exp(eta_alpha)
beta_true <- exp(eta_beta)

# Generate responses from Kumaraswamy distribution (assuming rkw is available)
y <- rkw(n, alpha = alpha_true, beta = beta_true)
# Create data frame
df1 <- data.frame(y = y, x1 = x1, x2 = x2)

# Fit Kumaraswamy regression model using extended formula syntax
# Model alpha ~ x1 + x2 and beta ~ x1 + x2
kw_reg <- gkwreg(y ~ x1 + x2 | x1 + x2, data = df1, family = "kw", silent = TRUE)

# Display summary
summary(kw_reg)

## Example 2: Generalized Kumaraswamy regression ----
set.seed(456)
x1 <- runif(n, -1, 1)
x2 <- rnorm(n)
x3 <- factor(rbinom(n, 1, 0.5), labels = c("A", "B")) # Factor variable

# True regression coefficients
alpha_coef <- c(0.5, 0.2) # Intercept, x1
beta_coef <- c(0.8, -0.3, 0.1) # Intercept, x1, x2
gamma_coef <- c(0.6, 0.4) # Intercept, x3B
delta_coef <- c(0.0, 0.2) # Intercept, x3B (logit scale)
lambda_coef <- c(-0.2, 0.1) # Intercept, x2

# Design matrices
X_alpha <- model.matrix(~x1, data = data.frame(x1 = x1))
X_beta <- model.matrix(~ x1 + x2, data = data.frame(x1 = x1, x2 = x2))
X_gamma <- model.matrix(~x3, data = data.frame(x3 = x3))
X_delta <- model.matrix(~x3, data = data.frame(x3 = x3))
X_lambda <- model.matrix(~x2, data = data.frame(x2 = x2))

# Generate linear predictors and transform to parameters
alpha <- exp(X_alpha %*% alpha_coef)
beta <- exp(X_beta %*% beta_coef)
gamma <- exp(X_gamma %*% gamma_coef)
delta <- plogis(X_delta %*% delta_coef) # logit link for delta
lambda <- exp(X_lambda %*% lambda_coef)

# Generate response from GKw distribution (assuming rgkw is available)
y <- rgkw(n, alpha = alpha, beta = beta, gamma = gamma, delta = delta, lambda = lambda)

# Create data frame
df2 <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)

# Fit GKw regression with parameter-specific formulas
# alpha ~ x1, beta ~ x1 + x2, gamma ~ x3, delta ~ x3, lambda ~ x2
gkw_reg <- gkwreg(y ~ x1 | x1 + x2 | x3 | x3 | x2, data = df2, family = "gkw")

# Compare true vs. estimated coefficients
print("Estimated Coefficients (GKw):")
print(coef(gkw_reg))
print("True Coefficients (approx):")
print(list(
  alpha = alpha_coef, beta = beta_coef, gamma = gamma_coef,
  delta = delta_coef, lambda = lambda_coef
))

## Example 3: Beta regression for comparison ----
set.seed(789)
x1 <- runif(n, -1, 1)

# True coefficients for Beta parameters (gamma = shape1, delta = shape2)
gamma_coef <- c(1.0, 0.5) # Intercept, x1 (log scale for shape1)
delta_coef <- c(1.5, -0.7) # Intercept, x1 (log scale for shape2)

# Generate linear predictors and transform (default link is log for Beta params here)
X_beta_eg <- model.matrix(~x1, data.frame(x1 = x1))
gamma_true <- exp(X_beta_eg %*% gamma_coef)
delta_true <- exp(X_beta_eg %*% delta_coef)

# Generate response from Beta distribution
y <- rbeta_(n, gamma_true, delta_true)

# Create data frame
df_beta <- data.frame(y = y, x1 = x1)

# Fit Beta regression model using gkwreg
# Formula maps to gamma and delta: y ~  x1 | x1
beta_reg <- gkwreg(y ~ x1 | x1,
  data = df_beta, family = "beta",
  link = list(gamma = "log", delta = "log")
) # Specify links if non-default

## Example 4: Model comparison using AIC/BIC ----
# Fit an alternative model, e.g., Kumaraswamy, to the same beta-generated data
kw_reg2 <- try(gkwreg(y ~ x1 | x1, data = df_beta, family = "kw"))

print("AIC Comparison (Beta vs Kw):")
c(AIC(beta_reg), AIC(kw_reg2))
print("BIC Comparison (Beta vs Kw):")
c(BIC(beta_reg), BIC(kw_reg2))

## Example 5: Predicting with a fitted model

# Use the Beta regression model from Example 3
newdata <- data.frame(x1 = seq(-1, 1, length.out = 20))

# Predict expected response (mean of the Beta distribution)
pred_response <- predict(beta_reg, newdata = newdata, type = "response")

# Predict parameters (gamma and delta) on the scale of the link function
pred_link <- predict(beta_reg, newdata = newdata, type = "link")

# Predict parameters on the original scale (shape1, shape2)
pred_params <- predict(beta_reg, newdata = newdata, type = "parameter")

# Plot original data and predicted mean response curve
plot(df_beta$x1, df_beta$y,
  pch = 20, col = "grey", xlab = "x1", ylab = "y",
  main = "Beta Regression Fit (using gkwreg)"
)
lines(newdata$x1, pred_response, col = "red", lwd = 2)
legend("topright", legend = "Predicted Mean", col = "red", lty = 1, lwd = 2)

gkwreg documentation built on April 16, 2025, 1:10 a.m.