gkwreg: Fit Generalized Kumaraswamy Regression Models

View source: R/gkwreg.R

gkwregR Documentation

Fit Generalized Kumaraswamy Regression Models

Description

Fits regression models using the Generalized Kumaraswamy (GKw) family of distributions for response variables strictly bounded in the interval (0, 1). The function allows modeling parameters from all seven submodels of the GKw family as functions of predictors using appropriate link functions. Estimation is performed using Maximum Likelihood via the TMB (Template Model Builder) package. Requires the Formula and TMB packages.

Usage

gkwreg(
  formula,
  data,
  family = c("gkw", "bkw", "kkw", "ekw", "mc", "kw", "beta"),
  link = NULL,
  start = NULL,
  fixed = NULL,
  method = c("nlminb", "BFGS", "Nelder-Mead", "CG", "SANN", "L-BFGS-B"),
  hessian = TRUE,
  plot = TRUE,
  conf.level = 0.95,
  optimizer.control = list(),
  subset = NULL,
  weights = NULL,
  offset = NULL,
  na.action = getOption("na.action"),
  contrasts = NULL,
  x = FALSE,
  y = TRUE,
  model = TRUE,
  silent = TRUE,
  ...
)

Arguments

formula

An object of class Formula (or one that can be coerced to that class). It should be structured as y ~ model_alpha | model_beta | model_gamma | model_delta | model_lambda, where y is the response variable and each model_* part specifies the linear predictor for the corresponding parameter (\alpha, \beta, \gamma, \delta, \lambda). If a part is omitted or specified as ~ 1 or ., an intercept-only model is used for that parameter. See Details for parameter correspondence in subfamilies.

data

A data frame containing the variables specified in the formula.

family

A character string specifying the desired distribution family. Defaults to "gkw". Supported families are:

  • "gkw": Generalized Kumaraswamy (5 parameters: \alpha, \beta, \gamma, \delta, \lambda)

  • "bkw": Beta-Kumaraswamy (4 parameters: \alpha, \beta, \gamma, \delta; \lambda = 1 fixed)

  • "kkw": Kumaraswamy-Kumaraswamy (4 parameters: \alpha, \beta, \delta, \lambda; \gamma = 1 fixed)

  • "ekw": Exponentiated Kumaraswamy (3 parameters: \alpha, \beta, \lambda; \gamma = 1, \delta = 0 fixed)

  • "mc": McDonald / Beta Power (3 parameters: \gamma, \delta, \lambda; \alpha = 1, \beta = 1 fixed)

  • "kw": Kumaraswamy (2 parameters: \alpha, \beta; \gamma = 1, \delta = 0, \lambda = 1 fixed)

  • "beta": Beta distribution (2 parameters: \gamma, \delta; \alpha = 1, \beta = 1, \lambda = 1 fixed)

link

Either a single character string specifying the same link function for all relevant parameters, or a named list specifying the link function for each modeled parameter (e.g., list(alpha = "log", beta = "log", delta = "logit")). Defaults are "log" for \alpha, \beta, \gamma, \lambda (parameters > 0) and "logit" for \delta (parameter in (0, 1)). Supported link functions are:

  • "log"

  • "logit"

  • "identity"

  • "inverse" (i.e., 1/mu)

  • "sqrt"

  • "probit"

  • "cloglog" (complementary log-log)

start

An optional named list providing initial values for the regression coefficients. Parameter names should match the distribution parameters (alpha, beta, etc.), and values should be vectors corresponding to the coefficients in the respective linear predictors (including intercept). If NULL (default), suitable starting values are automatically determined based on global parameter estimates.

fixed

An optional named list specifying parameters or coefficients to be held fixed at specific values during estimation. Currently not fully implemented.

method

Character string specifying the optimization algorithm to use. Options are "nlminb" (default, using nlminb), "BFGS", "Nelder-Mead", "CG", "SANN", or "L-BFGS-B". If "nlminb" is selected, R's nlminb function is used; otherwise, R's optim function is used with the specified method.

hessian

Logical. If TRUE (default), the Hessian matrix is computed via sdreport to obtain standard errors and the covariance matrix of the estimated coefficients. Setting to FALSE speeds up fitting but prevents calculation of standard errors and confidence intervals.

plot

Logical. If TRUE (default), enables the generation of diagnostic plots when calling the generic plot() function on the fitted object. Actual plotting is handled by the plot.gkwreg method.

conf.level

Numeric. The confidence level (1 - alpha) for constructing confidence intervals for the parameters. Default is 0.95. Used only if hessian = TRUE.

optimizer.control

A list of control parameters passed directly to the chosen optimizer (nlminb or optim). See their respective documentation for details.

subset

An optional vector specifying a subset of observations from data to be used in the fitting process.

weights

An optional vector of prior weights (e.g., frequency weights) to be used in the fitting process. Should be NULL or a numeric vector of non-negative values.

offset

An optional numeric vector or matrix specifying an a priori known component to be included on the scale of the linear predictor for each parameter. If a vector, it's applied to the predictor of the first parameter in the standard order (\alpha). If a matrix, columns must correspond to parameters in the order \alpha, \beta, \gamma, \delta, \lambda.

na.action

A function which indicates what should happen when the data contain NAs. The default (na.fail) stops if NAs are present. Other options include na.omit or na.exclude.

contrasts

An optional list specifying the contrasts to be used for factor variables in the model. See the contrasts.arg of model.matrix.default.

x

Logical. If TRUE, the list of model matrices (one for each modeled parameter) is returned as component x of the fitted object. Default FALSE.

y

Logical. If TRUE (default), the response variable (after processing by na.action, subset) is returned as component y.

model

Logical. If TRUE (default), the model frame (containing all variables used from data) is returned as component model.

silent

Logical. If TRUE (default), suppresses progress messages from TMB compilation and optimization. Set to FALSE for verbose output.

...

Additional arguments, currently unused or passed down to internal methods (potentially).

Details

The gkwreg function provides a regression framework for the Generalized Kumaraswamy (GKw) family and its submodels, extending density estimation to include covariates. The response variable must be strictly bounded in the (0, 1) interval.

Model Specification: The extended Formula syntax is crucial for specifying potentially different linear predictors for each relevant distribution parameter. The parameters (\alpha, \beta, \gamma, \delta, \lambda) correspond sequentially to the parts of the formula separated by |. For subfamilies where some parameters are fixed by definition (see family argument), the corresponding parts of the formula are automatically ignored. For example, in a family = "kw" model, only the first two parts (for \alpha and \beta) are relevant.

Parameter Constraints and Link Functions: The parameters \alpha, \beta, \gamma, \lambda are constrained to be positive, while \delta is constrained to the interval (0, 1). The default link functions ("log" for positive parameters, "logit" for \delta) ensure these constraints during estimation. Users can specify alternative link functions suitable for the parameter's domain via the link argument.

Families and Parameters: The function automatically handles parameter fixing based on the chosen family:

  • GKw: All 5 parameters (\alpha, \beta, \gamma, \delta, \lambda) modeled.

  • BKw: Models \alpha, \beta, \gamma, \delta; fixes \lambda = 1.

  • KKw: Models \alpha, \beta, \delta, \lambda; fixes \gamma = 1.

  • EKw: Models \alpha, \beta, \lambda; fixes \gamma = 1, \delta = 0.

  • Mc (McDonald): Models \gamma, \delta, \lambda; fixes \alpha = 1, \beta = 1.

  • Kw (Kumaraswamy): Models \alpha, \beta; fixes \gamma = 1, \delta = 0, \lambda = 1.

  • Beta: Models \gamma, \delta; fixes \alpha = 1, \beta = 1, \lambda = 1. This parameterization corresponds to the standard Beta distribution with shape1 = \gamma and shape2 = \delta.

Estimation Engine: Maximum Likelihood Estimation (MLE) is performed using C++ templates via the TMB package, which provides automatic differentiation and efficient optimization capabilities. The specific TMB template used depends on the chosen family.

Optimizer Method (method argument):

  • "nlminb": Uses R's built-in stats::nlminb optimizer. Good for problems with box constraints. Default option.

  • "Nelder-Mead": Uses R's stats::optim with the Nelder-Mead simplex algorithm, which doesn't require derivatives.

  • "BFGS": Uses R's stats::optim with the BFGS quasi-Newton method for unconstrained optimization.

  • "CG": Uses R's stats::optim with conjugate gradients method for unconstrained optimization.

  • "SANN": Uses R's stats::optim with simulated annealing, a global optimization method useful for problems with multiple local minima.

  • "L-BFGS-B": Uses R's stats::optim with the limited-memory BFGS method with box constraints.

Value

An object of class gkwreg. This is a list containing the following components:

call

The matched function call.

family

The specified distribution family string.

formula

The Formula object used.

coefficients

A named vector of estimated regression coefficients.

fitted.values

Vector of estimated means (expected values) of the response.

residuals

Vector of response residuals (observed - fitted mean).

fitted_parameters

List containing the estimated mean value for each distribution parameter (\alpha, \beta, \gamma, \delta, \lambda).

parameter_vectors

List containing vectors of the estimated parameters (\alpha, \beta, \gamma, \delta, \lambda) for each observation, evaluated on the link scale.

link

List of link functions used for each parameter.

param_names

Character vector of names of the parameters modeled by the family.

fixed_params

Named list indicating which parameters are fixed by the family definition.

loglik

The maximized log-likelihood value.

aic

Akaike Information Criterion.

bic

Bayesian Information Criterion.

deviance

The deviance ( -2 * loglik ).

df.residual

Residual degrees of freedom (nobs - npar).

nobs

Number of observations used in the fit.

npar

Total number of estimated parameters (coefficients).

vcov

The variance-covariance matrix of the coefficients (if hessian = TRUE).

se

Standard errors of the coefficients (if hessian = TRUE).

convergence

Convergence code from the optimizer (0 typically indicates success).

message

Convergence message from the optimizer.

iterations

Number of iterations used by the optimizer.

rmse

Root Mean Squared Error of response residuals.

efron_r2

Efron's pseudo R-squared.

mean_absolute_error

Mean Absolute Error of response residuals.

x

List of model matrices (if x = TRUE).

y

The response vector (if y = TRUE).

model

The model frame (if model = TRUE).

tmb_object

The raw object returned by MakeADFun.

Author(s)

Lopes, J. E.

References

Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.

Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883-898.

Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.

Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21. (Underlying TMB package)

Zeileis, A., Kleiber, C., Jackman, S. (2008). Regression Models for Count Data in R. Journal of Statistical Software, 27(8), 1-25.

Smithson, M., & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.

See Also

summary.gkwreg, predict.gkwreg, plot.gkwreg, coef.gkwreg, vcov.gkwreg, logLik, AIC, Formula, MakeADFun, sdreport

Examples


## Example 1: Simple Kumaraswamy regression model ----
set.seed(123)
n <- 1000
x1 <- runif(n, -2, 2)
x2 <- rnorm(n)

# True regression coefficients
alpha_coef <- c(0.8, 0.3, -0.2) # Intercept, x1, x2
beta_coef <- c(1.2, -0.4, 0.1) # Intercept, x1, x2

# Generate linear predictors and transform to parameters using inverse link (exp)
eta_alpha <- alpha_coef[1] + alpha_coef[2] * x1 + alpha_coef[3] * x2
eta_beta <- beta_coef[1] + beta_coef[2] * x1 + beta_coef[3] * x2
alpha_true <- exp(eta_alpha)
beta_true <- exp(eta_beta)

# Generate responses from Kumaraswamy distribution (assuming rkw is available)
y <- rkw(n, alpha = alpha_true, beta = beta_true)
# Create data frame
df1 <- data.frame(y = y, x1 = x1, x2 = x2)

# Fit Kumaraswamy regression model using extended formula syntax
# Model alpha ~ x1 + x2 and beta ~ x1 + x2
kw_reg <- gkwreg(y ~ x1 + x2 | x1 + x2, data = df1, family = "kw", silent = TRUE)

# Display summary
summary(kw_reg)

## Example 2: Generalized Kumaraswamy regression ----
set.seed(456)
x1 <- runif(n, -1, 1)
x2 <- rnorm(n)
x3 <- factor(rbinom(n, 1, 0.5), labels = c("A", "B")) # Factor variable

# True regression coefficients
alpha_coef <- c(0.5, 0.2) # Intercept, x1
beta_coef <- c(0.8, -0.3, 0.1) # Intercept, x1, x2
gamma_coef <- c(0.6, 0.4) # Intercept, x3B
delta_coef <- c(0.0, 0.2) # Intercept, x3B (logit scale)
lambda_coef <- c(-0.2, 0.1) # Intercept, x2

# Design matrices
X_alpha <- model.matrix(~x1, data = data.frame(x1 = x1))
X_beta <- model.matrix(~ x1 + x2, data = data.frame(x1 = x1, x2 = x2))
X_gamma <- model.matrix(~x3, data = data.frame(x3 = x3))
X_delta <- model.matrix(~x3, data = data.frame(x3 = x3))
X_lambda <- model.matrix(~x2, data = data.frame(x2 = x2))

# Generate linear predictors and transform to parameters
alpha <- exp(X_alpha %*% alpha_coef)
beta <- exp(X_beta %*% beta_coef)
gamma <- exp(X_gamma %*% gamma_coef)
delta <- plogis(X_delta %*% delta_coef) # logit link for delta
lambda <- exp(X_lambda %*% lambda_coef)

# Generate response from GKw distribution (assuming rgkw is available)
y <- rgkw(n, alpha = alpha, beta = beta, gamma = gamma, delta = delta, lambda = lambda)

# Create data frame
df2 <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)

# Fit GKw regression with parameter-specific formulas
# alpha ~ x1, beta ~ x1 + x2, gamma ~ x3, delta ~ x3, lambda ~ x2
gkw_reg <- gkwreg(y ~ x1 | x1 + x2 | x3 | x3 | x2, data = df2, family = "gkw")

# Compare true vs. estimated coefficients
print("Estimated Coefficients (GKw):")
print(coef(gkw_reg))
print("True Coefficients (approx):")
print(list(
  alpha = alpha_coef, beta = beta_coef, gamma = gamma_coef,
  delta = delta_coef, lambda = lambda_coef
))

## Example 3: Beta regression for comparison ----
set.seed(789)
x1 <- runif(n, -1, 1)

# True coefficients for Beta parameters (gamma = shape1, delta = shape2)
gamma_coef <- c(1.0, 0.5) # Intercept, x1 (log scale for shape1)
delta_coef <- c(1.5, -0.7) # Intercept, x1 (log scale for shape2)

# Generate linear predictors and transform (default link is log for Beta params here)
X_beta_eg <- model.matrix(~x1, data.frame(x1 = x1))
gamma_true <- exp(X_beta_eg %*% gamma_coef)
delta_true <- exp(X_beta_eg %*% delta_coef)

# Generate response from Beta distribution
y <- rbeta_(n, gamma_true, delta_true)

# Create data frame
df_beta <- data.frame(y = y, x1 = x1)

# Fit Beta regression model using gkwreg
# Formula maps to gamma and delta: y ~  x1 | x1
beta_reg <- gkwreg(y ~ x1 | x1,
  data = df_beta, family = "beta",
  link = list(gamma = "log", delta = "log")
) # Specify links if non-default

## Example 4: Model comparison using AIC/BIC ----
# Fit an alternative model, e.g., Kumaraswamy, to the same beta-generated data
kw_reg2 <- try(gkwreg(y ~ x1 | x1, data = df_beta, family = "kw"))

print("AIC Comparison (Beta vs Kw):")
c(AIC(beta_reg), AIC(kw_reg2))
print("BIC Comparison (Beta vs Kw):")
c(BIC(beta_reg), BIC(kw_reg2))

## Example 5: Predicting with a fitted model

# Use the Beta regression model from Example 3
newdata <- data.frame(x1 = seq(-1, 1, length.out = 20))

# Predict expected response (mean of the Beta distribution)
pred_response <- predict(beta_reg, newdata = newdata, type = "response")

# Predict parameters (gamma and delta) on the scale of the link function
pred_link <- predict(beta_reg, newdata = newdata, type = "link")

# Predict parameters on the original scale (shape1, shape2)
pred_params <- predict(beta_reg, newdata = newdata, type = "parameter")

# Plot original data and predicted mean response curve
plot(df_beta$x1, df_beta$y,
  pch = 20, col = "grey", xlab = "x1", ylab = "y",
  main = "Beta Regression Fit (using gkwreg)"
)
lines(newdata$x1, pred_response, col = "red", lwd = 2)
legend("topright", legend = "Predicted Mean", col = "red", lty = 1, lwd = 2)



gkwreg documentation built on April 16, 2025, 1:10 a.m.