gkwreg | R Documentation |
Fits regression models using the Generalized Kumaraswamy (GKw) family of
distributions for response variables strictly bounded in the interval (0, 1).
The function allows modeling parameters from all seven submodels of the GKw
family as functions of predictors using appropriate link functions. Estimation
is performed using Maximum Likelihood via the TMB (Template Model Builder) package.
Requires the Formula
and TMB
packages.
gkwreg(
formula,
data,
family = c("gkw", "bkw", "kkw", "ekw", "mc", "kw", "beta"),
link = NULL,
start = NULL,
fixed = NULL,
method = c("nlminb", "BFGS", "Nelder-Mead", "CG", "SANN", "L-BFGS-B"),
hessian = TRUE,
plot = TRUE,
conf.level = 0.95,
optimizer.control = list(),
subset = NULL,
weights = NULL,
offset = NULL,
na.action = getOption("na.action"),
contrasts = NULL,
x = FALSE,
y = TRUE,
model = TRUE,
silent = TRUE,
...
)
formula |
An object of class |
data |
A data frame containing the variables specified in the |
family |
A character string specifying the desired distribution family.
Defaults to
|
link |
Either a single character string specifying the same link function
for all relevant parameters, or a named list specifying the link function for each
modeled parameter (e.g.,
|
start |
An optional named list providing initial values for the regression
coefficients. Parameter names should match the distribution parameters (alpha,
beta, etc.), and values should be vectors corresponding to the coefficients
in the respective linear predictors (including intercept). If |
fixed |
An optional named list specifying parameters or coefficients to be held fixed at specific values during estimation. Currently not fully implemented. |
method |
Character string specifying the optimization algorithm to use.
Options are |
hessian |
Logical. If |
plot |
Logical. If |
conf.level |
Numeric. The confidence level (1 - alpha) for constructing
confidence intervals for the parameters. Default is 0.95. Used only if
|
optimizer.control |
A list of control parameters passed directly to the
chosen optimizer ( |
subset |
An optional vector specifying a subset of observations from |
weights |
An optional vector of prior weights (e.g., frequency weights)
to be used in the fitting process. Should be |
offset |
An optional numeric vector or matrix specifying an a priori known
component to be included on the scale of the linear predictor for each parameter.
If a vector, it's applied to the predictor of the first parameter in the standard
order ( |
na.action |
A function which indicates what should happen when the data
contain |
contrasts |
An optional list specifying the contrasts to be used for factor
variables in the model. See the |
x |
Logical. If |
y |
Logical. If |
model |
Logical. If |
silent |
Logical. If |
... |
Additional arguments, currently unused or passed down to internal methods (potentially). |
The gkwreg
function provides a regression framework for the Generalized
Kumaraswamy (GKw) family and its submodels, extending density estimation
to include covariates. The response variable must be strictly bounded in the
(0, 1) interval.
Model Specification:
The extended Formula
syntax is crucial for specifying
potentially different linear predictors for each relevant distribution parameter.
The parameters (\alpha, \beta, \gamma, \delta, \lambda
) correspond sequentially
to the parts of the formula separated by |
. For subfamilies where some
parameters are fixed by definition (see family
argument), the corresponding
parts of the formula are automatically ignored. For example, in a family = "kw"
model, only the first two parts (for \alpha
and \beta
) are relevant.
Parameter Constraints and Link Functions:
The parameters \alpha, \beta, \gamma, \lambda
are constrained to be positive,
while \delta
is constrained to the interval (0, 1). The default link functions
("log"
for positive parameters, "logit"
for \delta
) ensure these
constraints during estimation. Users can specify alternative link functions suitable
for the parameter's domain via the link
argument.
Families and Parameters:
The function automatically handles parameter fixing based on the chosen family
:
GKw: All 5 parameters (\alpha, \beta, \gamma, \delta, \lambda
) modeled.
BKw: Models \alpha, \beta, \gamma, \delta
; fixes \lambda = 1
.
KKw: Models \alpha, \beta, \delta, \lambda
; fixes \gamma = 1
.
EKw: Models \alpha, \beta, \lambda
; fixes \gamma = 1, \delta = 0
.
Mc (McDonald): Models \gamma, \delta, \lambda
; fixes \alpha = 1, \beta = 1
.
Kw (Kumaraswamy): Models \alpha, \beta
; fixes \gamma = 1, \delta = 0, \lambda = 1
.
Beta: Models \gamma, \delta
; fixes \alpha = 1, \beta = 1, \lambda = 1
. This parameterization corresponds to the standard Beta distribution with shape1 = \gamma
and shape2 = \delta
.
Estimation Engine:
Maximum Likelihood Estimation (MLE) is performed using C++ templates via the
TMB
package, which provides automatic differentiation and efficient
optimization capabilities. The specific TMB template used depends on the chosen family
.
Optimizer Method (method
argument):
"nlminb"
: Uses R's built-in stats::nlminb
optimizer. Good for problems with box constraints. Default option.
"Nelder-Mead"
: Uses R's stats::optim
with the Nelder-Mead simplex algorithm, which doesn't require derivatives.
"BFGS"
: Uses R's stats::optim
with the BFGS quasi-Newton method for unconstrained optimization.
"CG"
: Uses R's stats::optim
with conjugate gradients method for unconstrained optimization.
"SANN"
: Uses R's stats::optim
with simulated annealing, a global optimization method useful for problems with multiple local minima.
"L-BFGS-B"
: Uses R's stats::optim
with the limited-memory BFGS method with box constraints.
An object of class gkwreg
. This is a list containing the
following components:
call |
The matched function call. |
family |
The specified distribution family string. |
formula |
The |
coefficients |
A named vector of estimated regression coefficients. |
fitted.values |
Vector of estimated means (expected values) of the response. |
residuals |
Vector of response residuals (observed - fitted mean). |
fitted_parameters |
List containing the estimated mean value for each distribution parameter ( |
parameter_vectors |
List containing vectors of the estimated parameters ( |
link |
List of link functions used for each parameter. |
param_names |
Character vector of names of the parameters modeled by the family. |
fixed_params |
Named list indicating which parameters are fixed by the family definition. |
loglik |
The maximized log-likelihood value. |
aic |
Akaike Information Criterion. |
bic |
Bayesian Information Criterion. |
deviance |
The deviance ( -2 * loglik ). |
df.residual |
Residual degrees of freedom (nobs - npar). |
nobs |
Number of observations used in the fit. |
npar |
Total number of estimated parameters (coefficients). |
vcov |
The variance-covariance matrix of the coefficients (if |
se |
Standard errors of the coefficients (if |
convergence |
Convergence code from the optimizer (0 typically indicates success). |
message |
Convergence message from the optimizer. |
iterations |
Number of iterations used by the optimizer. |
rmse |
Root Mean Squared Error of response residuals. |
efron_r2 |
Efron's pseudo R-squared. |
mean_absolute_error |
Mean Absolute Error of response residuals. |
x |
List of model matrices (if |
y |
The response vector (if |
model |
The model frame (if |
tmb_object |
The raw object returned by |
Lopes, J. E.
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88.
Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883-898.
Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.
Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21. (Underlying TMB package)
Zeileis, A., Kleiber, C., Jackman, S. (2008). Regression Models for Count Data in R. Journal of Statistical Software, 27(8), 1-25.
Smithson, M., & Verkuilen, J. (2006). A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54–71.
summary.gkwreg
, predict.gkwreg
,
plot.gkwreg
, coef.gkwreg
, vcov.gkwreg
,
logLik
, AIC
,
Formula
, MakeADFun
,
sdreport
## Example 1: Simple Kumaraswamy regression model ----
set.seed(123)
n <- 1000
x1 <- runif(n, -2, 2)
x2 <- rnorm(n)
# True regression coefficients
alpha_coef <- c(0.8, 0.3, -0.2) # Intercept, x1, x2
beta_coef <- c(1.2, -0.4, 0.1) # Intercept, x1, x2
# Generate linear predictors and transform to parameters using inverse link (exp)
eta_alpha <- alpha_coef[1] + alpha_coef[2] * x1 + alpha_coef[3] * x2
eta_beta <- beta_coef[1] + beta_coef[2] * x1 + beta_coef[3] * x2
alpha_true <- exp(eta_alpha)
beta_true <- exp(eta_beta)
# Generate responses from Kumaraswamy distribution (assuming rkw is available)
y <- rkw(n, alpha = alpha_true, beta = beta_true)
# Create data frame
df1 <- data.frame(y = y, x1 = x1, x2 = x2)
# Fit Kumaraswamy regression model using extended formula syntax
# Model alpha ~ x1 + x2 and beta ~ x1 + x2
kw_reg <- gkwreg(y ~ x1 + x2 | x1 + x2, data = df1, family = "kw", silent = TRUE)
# Display summary
summary(kw_reg)
## Example 2: Generalized Kumaraswamy regression ----
set.seed(456)
x1 <- runif(n, -1, 1)
x2 <- rnorm(n)
x3 <- factor(rbinom(n, 1, 0.5), labels = c("A", "B")) # Factor variable
# True regression coefficients
alpha_coef <- c(0.5, 0.2) # Intercept, x1
beta_coef <- c(0.8, -0.3, 0.1) # Intercept, x1, x2
gamma_coef <- c(0.6, 0.4) # Intercept, x3B
delta_coef <- c(0.0, 0.2) # Intercept, x3B (logit scale)
lambda_coef <- c(-0.2, 0.1) # Intercept, x2
# Design matrices
X_alpha <- model.matrix(~x1, data = data.frame(x1 = x1))
X_beta <- model.matrix(~ x1 + x2, data = data.frame(x1 = x1, x2 = x2))
X_gamma <- model.matrix(~x3, data = data.frame(x3 = x3))
X_delta <- model.matrix(~x3, data = data.frame(x3 = x3))
X_lambda <- model.matrix(~x2, data = data.frame(x2 = x2))
# Generate linear predictors and transform to parameters
alpha <- exp(X_alpha %*% alpha_coef)
beta <- exp(X_beta %*% beta_coef)
gamma <- exp(X_gamma %*% gamma_coef)
delta <- plogis(X_delta %*% delta_coef) # logit link for delta
lambda <- exp(X_lambda %*% lambda_coef)
# Generate response from GKw distribution (assuming rgkw is available)
y <- rgkw(n, alpha = alpha, beta = beta, gamma = gamma, delta = delta, lambda = lambda)
# Create data frame
df2 <- data.frame(y = y, x1 = x1, x2 = x2, x3 = x3)
# Fit GKw regression with parameter-specific formulas
# alpha ~ x1, beta ~ x1 + x2, gamma ~ x3, delta ~ x3, lambda ~ x2
gkw_reg <- gkwreg(y ~ x1 | x1 + x2 | x3 | x3 | x2, data = df2, family = "gkw")
# Compare true vs. estimated coefficients
print("Estimated Coefficients (GKw):")
print(coef(gkw_reg))
print("True Coefficients (approx):")
print(list(
alpha = alpha_coef, beta = beta_coef, gamma = gamma_coef,
delta = delta_coef, lambda = lambda_coef
))
## Example 3: Beta regression for comparison ----
set.seed(789)
x1 <- runif(n, -1, 1)
# True coefficients for Beta parameters (gamma = shape1, delta = shape2)
gamma_coef <- c(1.0, 0.5) # Intercept, x1 (log scale for shape1)
delta_coef <- c(1.5, -0.7) # Intercept, x1 (log scale for shape2)
# Generate linear predictors and transform (default link is log for Beta params here)
X_beta_eg <- model.matrix(~x1, data.frame(x1 = x1))
gamma_true <- exp(X_beta_eg %*% gamma_coef)
delta_true <- exp(X_beta_eg %*% delta_coef)
# Generate response from Beta distribution
y <- rbeta_(n, gamma_true, delta_true)
# Create data frame
df_beta <- data.frame(y = y, x1 = x1)
# Fit Beta regression model using gkwreg
# Formula maps to gamma and delta: y ~ x1 | x1
beta_reg <- gkwreg(y ~ x1 | x1,
data = df_beta, family = "beta",
link = list(gamma = "log", delta = "log")
) # Specify links if non-default
## Example 4: Model comparison using AIC/BIC ----
# Fit an alternative model, e.g., Kumaraswamy, to the same beta-generated data
kw_reg2 <- try(gkwreg(y ~ x1 | x1, data = df_beta, family = "kw"))
print("AIC Comparison (Beta vs Kw):")
c(AIC(beta_reg), AIC(kw_reg2))
print("BIC Comparison (Beta vs Kw):")
c(BIC(beta_reg), BIC(kw_reg2))
## Example 5: Predicting with a fitted model
# Use the Beta regression model from Example 3
newdata <- data.frame(x1 = seq(-1, 1, length.out = 20))
# Predict expected response (mean of the Beta distribution)
pred_response <- predict(beta_reg, newdata = newdata, type = "response")
# Predict parameters (gamma and delta) on the scale of the link function
pred_link <- predict(beta_reg, newdata = newdata, type = "link")
# Predict parameters on the original scale (shape1, shape2)
pred_params <- predict(beta_reg, newdata = newdata, type = "parameter")
# Plot original data and predicted mean response curve
plot(df_beta$x1, df_beta$y,
pch = 20, col = "grey", xlab = "x1", ylab = "y",
main = "Beta Regression Fit (using gkwreg)"
)
lines(newdata$x1, pred_response, col = "red", lwd = 2)
legend("topright", legend = "Predicted Mean", col = "red", lty = 1, lwd = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.