| gkwreg | R Documentation |
Fits regression models using the Generalized Kumaraswamy (GKw) family of distributions for modeling response variables strictly bounded in the interval (0, 1). The function provides a unified interface for fitting seven nested submodels of the GKw family, allowing flexible modeling of proportions, rates, and other bounded continuous outcomes through regression on distributional parameters.
Maximum Likelihood Estimation is performed via automatic differentiation using
the TMB (Template Model Builder) package, ensuring computational efficiency and
numerical accuracy. The interface follows standard R regression modeling
conventions similar to lm, glm, and
betareg, making it immediately familiar to R users.
gkwreg(
formula,
data,
family = c("gkw", "bkw", "kkw", "ekw", "mc", "kw", "beta"),
link = NULL,
link_scale = NULL,
subset = NULL,
weights = NULL,
offset = NULL,
na.action = getOption("na.action"),
contrasts = NULL,
control = gkw_control(),
model = TRUE,
x = FALSE,
y = TRUE,
...
)
formula |
An object of class
where:
If a part is omitted or specified as |
data |
A data frame containing the variables specified in |
family |
A character string specifying the distribution family from the Generalized Kumaraswamy hierarchy. Must be one of:
See Details for guidance on family selection. |
link |
Link function(s) for the distributional parameters. Can be specified as:
Default links (used if
Available link functions:
|
link_scale |
Numeric scale factor(s) controlling the transformation intensity of link functions. Can be:
Default scales (used if
Larger values produce more gradual transformations; smaller values produce more extreme transformations. For probability-type links (logit, probit), smaller scales (e.g., 0.5-2) create steeper response curves, while larger scales (e.g., 5-20) create gentler curves. Adjust if convergence issues arise or if you need different response sensitivities. |
subset |
Optional vector specifying a subset of observations to be used in fitting. Can be a logical vector, integer indices, or expression evaluating to one of these. Standard R subsetting rules apply. |
weights |
Optional numeric vector of prior weights (e.g., frequency weights) for observations. Should be non-negative. Currently experimental; use with caution and validate results. |
offset |
Optional numeric vector or matrix specifying an a priori known
component to be included in the linear predictor(s). If a vector, it is applied
to the first parameter's predictor. If a matrix, columns correspond to parameters
in order ( |
na.action |
A function specifying how to handle missing values (
See |
contrasts |
Optional list specifying contrasts for factor variables in the
model. Format: named list where names are factor variable names and values are
contrast specifications. See |
control |
A list of control parameters from
Default is |
model |
Logical. If |
x |
Logical. If |
y |
Logical. If |
... |
Additional arguments. Currently used only for backward compatibility
with deprecated arguments from earlier versions. Using deprecated arguments
triggers informative warnings with migration guidance. Examples of deprecated
arguments: |
The Generalized Kumaraswamy family provides a flexible hierarchy for modeling bounded responses. Selection should be guided by:
1. Start Simple: Begin with two-parameter families ("kw" or
"beta") unless you have strong reasons to use more complex models.
2. Model Comparison: Use information criteria (AIC, BIC) and likelihood ratio tests to compare nested models:
# Fit sequence of nested models fit_kw <- gkwreg(y ~ x, data, family = "kw") fit_ekw <- gkwreg(y ~ x, data, family = "ekw") fit_gkw <- gkwreg(y ~ x, data, family = "gkw") # Compare via AIC AIC(fit_kw, fit_ekw, fit_gkw) # Formal test (nested models only) anova(fit_kw, fit_ekw, fit_gkw)
3. Family Characteristics:
Beta: Traditional choice, well-understood, good for symmetric or moderately skewed data
Kumaraswamy (kw): Computationally efficient alternative to Beta, closed-form CDF, similar flexibility
Exponentiated Kumaraswamy (ekw): Adds flexibility for extreme values and heavy tails
Beta-Kumaraswamy (bkw), Kumaraswamy-Kumaraswamy (kkw): Four-parameter alternatives when three parameters insufficient
McDonald (mc): Beta generalization via power parameter, useful for J-shaped distributions
Kumaraswamy-Kumaraswamy (kkw): Most flexible, use only when simpler families inadequate. It extends kw
Generalized Kumaraswamy (gkw): Most flexible, use only when simpler families inadequate
4. Avoid Overfitting: More different parameters better model. Use cross-validation or hold-out validation to assess predictive performance.
The extended formula syntax allows different predictors for each parameter:
Basic Examples:
# Same predictors for both parameters (two-parameter family) y ~ x1 + x2 # Equivalent to: y ~ x1 + x2 | x1 + x2 # Different predictors per parameter y ~ x1 + x2 | x3 + x4 # alpha depends on x1, x2 # beta depends on x3, x4 # Intercept-only for some parameters y ~ x1 | 1 # alpha depends on x1 # beta has only intercept # Complex specification (five-parameter family) y ~ x1 | x2 | x3 | x4 | x5 # alpha ~ x1, beta ~ x2, gamma ~ x3, delta ~ x4, lambda ~ x5
Important Notes:
Formula parts correspond to parameters in order:
\alpha, \beta, \gamma, \delta, \lambda
Unused parts (due to family constraints) are automatically ignored
Use . to include all predictors: y ~ . | .
Standard R formula features work: interactions (x1:x2),
polynomials (poly(x, 2)), transformations (log(x)), etc.
Link functions map the range of distributional parameters to the real line, ensuring parameter constraints are satisfied during optimization.
Choosing Links:
Defaults are usually best: The automatic choices (log for shape parameters, logit for delta) work well in most cases
Alternative links: Consider if you have theoretical reasons (e.g., probit for latent variable interpretation) or convergence issues
Identity link: Avoid unless you have constraints elsewhere; can lead to invalid parameter values during optimization
Link Scales:
The link_scale parameter controls transformation intensity. Think of it
as a "sensitivity" parameter:
Larger values (e.g., 20): Gentler response to predictor changes
Smaller values (e.g., 2): Steeper response to predictor changes
Default (10): Balanced, works well for most cases
Adjust only if:
Convergence difficulties arise
You need very steep or very gentle response curves
Predictors have unusual scales (very large or very small)
The default optimizer (method = "nlminb") works well for most problems.
If convergence issues occur:
1. Check Data:
Ensure response is strictly in (0, 1)
Check for extreme outliers or influential points
Verify predictors aren't perfectly collinear
Consider rescaling predictors to similar ranges
2. Try Alternative Optimizers:
# BFGS often more robust for difficult problems
fit <- gkwreg(y ~ x, data,
control = gkw_control(method = "BFGS"))
# Nelder-Mead for non-smooth objectives
fit <- gkwreg(y ~ x, data,
control = gkw_control(method = "Nelder-Mead"))
3. Adjust Tolerances:
# Increase iterations and loosen tolerance
fit <- gkwreg(y ~ x, data,
control = gkw_control(maxit = 1000, reltol = 1e-6))
4. Provide Starting Values:
# Fit simpler model first, use as starting values
fit_simple <- gkwreg(y ~ 1, data, family = "kw")
start_vals <- list(
alpha = c(coef(fit_simple)[1], rep(0, ncol(X_alpha) - 1)),
beta = c(coef(fit_simple)[2], rep(0, ncol(X_beta) - 1))
)
fit_complex <- gkwreg(y ~ x1 + x2 | x3 + x4, data, family = "kw",
control = gkw_control(start = start_vals))
5. Simplify Model:
Use simpler family (e.g., "kw" instead of "gkw")
Reduce number of predictors
Use intercept-only for some parameters
By default, standard errors are computed via the Hessian matrix at the MLE. This provides valid asymptotic standard errors under standard regularity conditions.
When Standard Errors May Be Unreliable:
Small sample sizes (n < 30-50 per parameter)
Parameters near boundaries
Highly collinear predictors
Mis-specified models
Alternatives:
Bootstrap confidence intervals (more robust, computationally expensive)
Profile likelihood intervals via confint(..., type = "profile")
(not yet implemented)
Cross-validation for predictive performance assessment
To skip Hessian computation (faster, no SEs):
fit <- gkwreg(y ~ x, data,
control = gkw_control(hessian = FALSE))
Always check model adequacy using diagnostic plots:
fit <- gkwreg(y ~ x, data, family = "kw") plot(fit) # Six diagnostic plots
Key diagnostics:
Residual plots: Check for patterns, heteroscedasticity
Half-normal plot: Assess distributional adequacy
Cook's distance: Identify influential observations
Predicted vs observed: Overall fit quality
See plot.gkwreg for detailed interpretation guidance.
Performance Tips:
GKw family: Most computationally expensive (~2-5x slower than kw/beta)
Beta/Kw families: Fastest, use when adequate
Large datasets (n > 10,000): Consider sampling for exploratory analysis
TMB uses automatic differentiation: Fast gradient/Hessian computation
Disable Hessian (hessian = FALSE) for faster fitting without SEs
Memory Usage:
Set model = FALSE, x = FALSE, y = FALSE to reduce
object size (but limits some post-fitting capabilities)
Hessian matrix scales as O(p²) where p = number of parameters
An object of class "gkwreg", which is a list containing the following
components. Standard S3 methods are available for this class (see Methods section).
Model Specification:
callThe matched function call
formulaThe Formula object used
familyCharacter string: distribution family used
linkNamed list: link functions for each parameter
link_scaleNamed list: link scale values for each parameter
param_namesCharacter vector: names of parameters for this family
fixed_paramsNamed list: parameters fixed by family definition
controlThe gkw_control object used for fitting
Parameter Estimates:
coefficientsNamed numeric vector: estimated regression coefficients (on link scale). Names follow the pattern "parameter:predictor", e.g., "alpha:(Intercept)", "alpha:x1", "beta:(Intercept)", "beta:x2".
fitted_parametersNamed list: mean values for each distribution
parameter (\alpha, \beta, \gamma, \delta, \lambda) averaged across
all observations
parameter_vectorsNamed list: observation-specific parameter
values. Contains vectors alphaVec, betaVec, gammaVec,
deltaVec, lambdaVec, each of length nobs
Fitted Values and Residuals:
fitted.valuesNumeric vector: fitted mean values
E[Y|X] for each observation
residualsNumeric vector: response residuals (observed - fitted) for each observation
Inference:
vcovVariance-covariance matrix of coefficient estimates.
Only present if control$hessian = TRUE. NULL otherwise.
seNumeric vector: standard errors of coefficients.
Only present if control$hessian = TRUE. NULL otherwise.
Model Fit Statistics:
loglikNumeric: maximized log-likelihood value
aicNumeric: Akaike Information Criterion (AIC = -2loglik + 2npar)
bicNumeric: Bayesian Information Criterion (BIC = -2*loglik + log(nobs)*npar)
devianceNumeric: deviance (-2 * loglik)
df.residualInteger: residual degrees of freedom (nobs - npar)
nobsInteger: number of observations used in fit
nparInteger: total number of estimated parameters
Diagnostic Statistics:
rmseNumeric: Root Mean Squared Error of response residuals
efron_r2Numeric: Efron's pseudo R-squared (1 - SSE/SST, where SSE = sum of squared errors, SST = total sum of squares)
mean_absolute_errorNumeric: Mean Absolute Error of response residuals
Optimization Details:
convergenceLogical: TRUE if optimizer converged
successfully, FALSE otherwise
messageCharacter: convergence message from optimizer
iterationsInteger: number of iterations used by optimizer
methodCharacter: optimization method used (e.g., "nlminb", "BFGS")
Optional Components (returned if requested via model, x, y):
modelData frame: the model frame (if model = TRUE)
xNamed list: model matrices for each parameter
(if x = TRUE)
yNumeric vector: the response variable (if y = TRUE)
Internal:
tmb_objectThe raw object returned by MakeADFun.
Contains the TMB automatic differentiation function and environment.
Primarily for internal use and advanced debugging.
The following S3 methods are available for objects of class "gkwreg":
Basic Methods:
print.gkwreg: Print basic model information
summary.gkwreg: Detailed model summary with coefficient
tables, tests, and fit statistics
coef.gkwreg: Extract coefficients
vcov.gkwreg: Extract variance-covariance matrix
logLik: Extract log-likelihood
AIC, BIC: Information criteria
Prediction and Fitted Values:
fitted.gkwreg: Extract fitted values
residuals.gkwreg: Extract residuals (multiple types available)
predict.gkwreg: Predict on new data
Inference:
confint.gkwreg: Confidence intervals for parameters
anova.gkwreg: Compare nested models via likelihood ratio tests
Diagnostics:
plot.gkwreg: Comprehensive diagnostic plots (6 types)
Lopes, J. E.
Maintainer: Lopes, J. E.
Generalized Kumaraswamy Distribution:
Cordeiro, G. M., & de Castro, M. (2011). A new family of generalized distributions. Journal of Statistical Computation and Simulation, 81(7), 883-898. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00949650903530745")}
Kumaraswamy Distribution:
Kumaraswamy, P. (1980). A generalized probability density function for double-bounded random processes. Journal of Hydrology, 46(1-2), 79-88. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/0022-1694(80)90036-0")}
Jones, M. C. (2009). Kumaraswamy's distribution: A beta-type distribution with some tractability advantages. Statistical Methodology, 6(1), 70-81. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.stamet.2008.04.001")}
Beta Regression:
Ferrari, S. L. P., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/0266476042000214501")}
Smithson, M., & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1), 54-71. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/1082-989X.11.1.54")}
Template Model Builder (TMB):
Kristensen, K., Nielsen, A., Berg, C. W., Skaug, H., & Bell, B. M. (2016). TMB: Automatic Differentiation and Laplace Approximation. Journal of Statistical Software, 70(5), 1-21. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v070.i05")}
Related Software:
Zeileis, A., & Croissant, Y. (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1-13. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v034.i01")}
Control and Inference:
gkw_control for fitting control parameters,
confint.gkwreg for confidence intervals,
anova.gkwreg for model comparison
Methods:
summary.gkwreg, plot.gkwreg,
coef.gkwreg, vcov.gkwreg,
fitted.gkwreg, residuals.gkwreg,
predict.gkwreg
Distributions:
dgkw, pgkw,
qgkw, rgkw
for the GKw distribution family functions
Related Packages:
betareg for traditional beta regression,
Formula for extended formula interface,
MakeADFun for TMB functionality
# SECTION 1: Basic Usage - Getting Started
# Load packages and data
library(gkwreg)
library(gkwdist)
data(GasolineYield)
# Example 1.1: Simplest possible model (intercept-only, all defaults)
fit_basic <- gkwreg(yield ~ 1, data = GasolineYield, family = "kw")
summary(fit_basic)
# Example 1.2: Model with predictors (uses all defaults)
# Default: family = "gkw", method = "nlminb", hessian = TRUE
fit_default <- gkwreg(yield ~ batch + temp, data = GasolineYield)
summary(fit_default)
# Example 1.3: Kumaraswamy model (two-parameter family)
# Default link functions: log for both alpha and beta
fit_kw <- gkwreg(yield ~ batch + temp, data = GasolineYield, family = "kw")
summary(fit_kw)
par(mfrow = c(3, 2))
plot(fit_kw, ask = FALSE)
# Example 1.4: Beta model for comparison
# Default links: log for gamma and delta
fit_beta <- gkwreg(yield ~ batch + temp, data = GasolineYield, family = "beta")
# Compare models using AIC/BIC
AIC(fit_kw, fit_beta)
BIC(fit_kw, fit_beta)
# SECTION 2: Using gkw_control() for Customization
# Example 2.1: Change optimization method to BFGS
fit_bfgs <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
control = gkw_control(method = "BFGS")
)
summary(fit_bfgs)
# Example 2.2: Increase iterations and enable verbose output
fit_verbose <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
control = gkw_control(
method = "nlminb",
maxit = 1000,
silent = FALSE, # Show optimization progress
trace = 1 # Print iteration details
)
)
# Example 2.3: Fast fitting without standard errors
# Useful for model exploration or large datasets
fit_fast <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
control = gkw_control(hessian = FALSE)
)
# Note: Cannot compute confint() without hessian
coef(fit_fast) # Point estimates still available
# Example 2.4: Custom convergence tolerances
fit_tight <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
control = gkw_control(
reltol = 1e-10, # Tighter convergence
maxit = 2000 # More iterations allowed
)
)
# SECTION 3: Advanced Formula Specifications
# Example 3.1: Different predictors for different parameters
# alpha depends on batch, beta depends on temp
fit_diff <- gkwreg(
yield ~ batch | temp,
data = GasolineYield,
family = "kw"
)
summary(fit_diff)
# Example 3.2: Intercept-only for one parameter
# alpha varies with predictors, beta is constant
fit_partial <- gkwreg(
yield ~ batch + temp | 1,
data = GasolineYield,
family = "kw"
)
# Example 3.3: Complex model with interactions
fit_interact <- gkwreg(
yield ~ batch * temp | temp + I(temp^2),
data = GasolineYield,
family = "kw"
)
# SECTION 4: Working with Different Families
# Example 4.1: Fit multiple families and compare
families <- c("beta", "kw", "ekw", "bkw", "gkw")
fits <- lapply(families, function(fam) {
gkwreg(yield ~ batch + temp, data = GasolineYield, family = fam)
})
names(fits) <- families
# Compare via information criteria
comparison <- data.frame(
Family = families,
LogLik = sapply(fits, logLik),
AIC = sapply(fits, AIC),
BIC = sapply(fits, BIC),
npar = sapply(fits, function(x) x$npar)
)
print(comparison)
# Example 4.2: Formal nested model testing
fit_kw <- gkwreg(yield ~ batch + temp, GasolineYield, family = "kw")
fit_ekw <- gkwreg(yield ~ batch + temp, GasolineYield, family = "ekw")
fit_gkw <- gkwreg(yield ~ batch + temp, GasolineYield, family = "gkw")
anova(fit_kw, fit_ekw, fit_gkw)
# SECTION 5: Link Functions and Scales
# Example 5.1: Custom link functions
fit_links <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
link = list(alpha = "sqrt", beta = "log")
)
# Example 5.2: Custom link scales
# Smaller scale = steeper response curve
fit_scale <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
link_scale = list(alpha = 5, beta = 15)
)
# Example 5.3: Uniform link for all parameters
fit_uniform <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
link = "log" # Single string applied to all
)
# SECTION 6: Prediction and Inference
# Fit model for prediction examples
fit <- gkwreg(yield ~ batch + temp, GasolineYield, family = "kw")
# Example 6.1: Confidence intervals at different levels
confint(fit, level = 0.95) # 95% CI
confint(fit, level = 0.90) # 90% CI
confint(fit, level = 0.99) # 99% CI
# SECTION 7: Diagnostic Plots and Model Checking
fit <- gkwreg(yield ~ batch + temp, GasolineYield, family = "kw")
# Example 7.1: All diagnostic plots (default)
par(mfrow = c(3, 2))
plot(fit, ask = FALSE)
# Example 7.2: Select specific plots
par(mfrow = c(3, 1))
plot(fit, which = c(2, 4, 5)) # Cook's distance, Residuals, Half-normal
# Example 7.3: Using ggplot2 for modern graphics
plot(fit, use_ggplot = TRUE, arrange_plots = TRUE)
# Example 7.4: Customized half-normal plot
par(mfrow = c(1, 1))
plot(fit,
which = 5,
type = "quantile",
nsim = 200, # More simulations for smoother envelope
level = 0.95
) # 95% confidence envelope
# Example 7.5: Extract diagnostic data programmatically
diagnostics <- plot(fit, save_diagnostics = TRUE)
head(diagnostics$data) # Residuals, Cook's distance, etc.
# SECTION 8: Real Data Example - Food Expenditure
# Load and prepare data
data(FoodExpenditure, package = "betareg")
food_data <- FoodExpenditure
food_data$prop <- food_data$food / food_data$income
# Example 8.1: Basic model
fit_food <- gkwreg(
prop ~ persons | income,
data = food_data,
family = "kw"
)
summary(fit_food)
# Example 8.2: Compare with Beta regression
fit_food_beta <- gkwreg(
prop ~ persons | income,
data = food_data,
family = "beta"
)
# Which fits better?
AIC(fit_food, fit_food_beta)
# Example 8.3: Model diagnostics
par(mfrow = c(3, 1))
plot(fit_food, which = c(2, 5, 6))
# Example 8.4: Interpretation via effects
# How does proportion spent on food change with income?
income_seq <- seq(min(food_data$income), max(food_data$income), length = 50)
pred_data <- data.frame(
persons = median(food_data$persons),
income = income_seq
)
pred_food <- predict(fit_food, newdata = pred_data, type = "response")
par(mfrow = c(1, 1))
plot(food_data$income, food_data$prop,
xlab = "Income", ylab = "Proportion Spent on Food",
main = "Food Expenditure Pattern"
)
lines(income_seq, pred_food, col = "red", lwd = 2)
# SECTION 9: Simulation Studies
# Example 9.1: Simple Kumaraswamy simulation
set.seed(123)
n <- 500
x1 <- runif(n, -2, 2)
x2 <- rnorm(n)
# True model: log(alpha) = 0.8 + 0.3*x1, log(beta) = 1.2 - 0.2*x2
eta_alpha <- 0.8 + 0.3 * x1
eta_beta <- 1.2 - 0.2 * x2
alpha_true <- exp(eta_alpha)
beta_true <- exp(eta_beta)
# Generate response
y <- rkw(n, alpha = alpha_true, beta = beta_true)
sim_data <- data.frame(y = y, x1 = x1, x2 = x2)
# Fit and check parameter recovery
fit_sim <- gkwreg(y ~ x1 | x2, data = sim_data, family = "kw")
# Compare estimated vs true coefficients
cbind(
True = c(0.8, 0.3, 1.2, -0.2),
Estimated = coef(fit_sim),
SE = fit_sim$se
)
# Example 9.2: Complex simulation with all five parameters
set.seed(2203)
n <- 2000
x <- runif(n, -1, 1)
# True parameters
alpha <- exp(0.5 + 0.3 * x)
beta <- exp(1.0 - 0.2 * x)
gamma <- exp(0.7 + 0.4 * x)
delta <- plogis(0.0 + 0.5 * x) # logit scale
lambda <- exp(-0.3 + 0.2 * x)
# Generate from GKw
y <- rgkw(n,
alpha = alpha, beta = beta, gamma = gamma,
delta = delta, lambda = lambda
)
sim_data2 <- data.frame(y = y, x = x)
# Fit GKw model
fit_gkw <- gkwreg(
y ~ x | x | x | x | x,
data = sim_data2,
family = "gkw",
control = gkw_control(method = "L-BFGS-B", maxit = 2000)
)
summary(fit_gkw)
# SECTION 10: Handling Convergence Issues
# Example 10.1: Try different optimizers
methods <- c("nlminb", "BFGS", "Nelder-Mead", "CG")
fits_methods <- lapply(methods, function(m) {
tryCatch(
gkwreg(yield ~ batch + temp, GasolineYield,
family = "kw",
control = gkw_control(method = m, silent = TRUE)
),
error = function(e) NULL
)
})
names(fits_methods) <- methods
# Check which converged
converged <- sapply(fits_methods, function(f) {
if (is.null(f)) {
return(FALSE)
}
f$convergence
})
print(converged)
# Example 10.2: Verbose mode for debugging
fit_debug <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
control = gkw_control(
method = "BFGS",
silent = TRUE,
trace = 0, # 2, Maximum verbosity
maxit = 1000
)
)
# SECTION 11: Memory and Performance Optimization
# Example 11.1: Minimal object for large datasets
fit_minimal <- gkwreg(
yield ~ batch + temp,
data = GasolineYield,
family = "kw",
model = FALSE, # Don't store model frame
x = FALSE, # Don't store design matrices
y = FALSE, # Don't store response
control = gkw_control(hessian = FALSE) # Skip Hessian
)
# Much smaller object
object.size(fit_minimal)
# Trade-off: Limited post-fitting capabilities
# Can still use: coef(), logLik(), AIC(), BIC()
# Cannot use: predict(), some diagnostics
# Example 11.2: Fast exploratory analysis
# Fit many models quickly without standard errors
formulas <- list(
yield ~ batch,
yield ~ temp,
yield ~ batch + temp,
yield ~ batch * temp
)
fast_fits <- lapply(formulas, function(f) {
gkwreg(f, GasolineYield,
family = "kw",
control = gkw_control(hessian = FALSE),
model = FALSE, x = FALSE, y = FALSE
)
})
# Compare models via AIC
sapply(fast_fits, AIC)
# Refit best model with full inference
best_formula <- formulas[[which.min(sapply(fast_fits, AIC))]]
fit_final <- gkwreg(best_formula, GasolineYield, family = "kw")
summary(fit_final)
# SECTION 12: Model Selection and Comparison
# Example 12.1: Nested model testing
fit1 <- gkwreg(yield ~ 1, GasolineYield, family = "kw")
fit2 <- gkwreg(yield ~ batch, GasolineYield, family = "kw")
fit3 <- gkwreg(yield ~ batch + temp, GasolineYield, family = "kw")
# Likelihood ratio tests
anova(fit1, fit2, fit3)
# Example 12.2: Information criteria table
models <- list(
"Intercept only" = fit1,
"Batch effect" = fit2,
"Batch + Temp" = fit3
)
ic_table <- data.frame(
Model = names(models),
df = sapply(models, function(m) m$npar),
LogLik = sapply(models, logLik),
AIC = sapply(models, AIC),
BIC = sapply(models, BIC),
Delta_AIC = sapply(models, AIC) - min(sapply(models, AIC))
)
print(ic_table)
# Example 12.3: Cross-validation for predictive performance
# 5-fold cross-validation
set.seed(2203)
n <- nrow(GasolineYield)
folds <- sample(rep(1:5, length.out = n))
cv_rmse <- sapply(1:5, function(fold) {
train <- GasolineYield[folds != fold, ]
test <- GasolineYield[folds == fold, ]
fit_train <- gkwreg(yield ~ batch + temp, train,
family = "kw"
)
pred_test <- predict(fit_train, newdata = test, type = "response")
sqrt(mean((test$yield - pred_test)^2))
})
cat("Cross-validated RMSE:", mean(cv_rmse), "\n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.