testGLMGamma: Apply Goodness of Fit Test to the Residuals of a Generalized...

View source: R/testGLMGamma.R

testGLMGammaR Documentation

Apply Goodness of Fit Test to the Residuals of a Generalized Linear Model with Gamma Link Function

Description

testGLMGamma is used to check the validity of Gamma assumption for the response variable when fitting generalized linear model. Common link functions in glm can be used here.

Usage

testGLMGamma(
  x,
  y,
  fit = NULL,
  l = "log",
  discretize = FALSE,
  ngrid = length(y),
  gridpit = TRUE,
  hessian = FALSE,
  start.value = NULL,
  control = NULL,
  method = "cvm",
  weight_function = NULL
)

Arguments

x

is either a numeric vector or a design matrix. In the design matrix, rows indicate observations and columns presents covariats.

y

is a vector of numeric values with the same number of observations or number of rows as x.

fit

is an object of class glm and its default value is NULL. If a fit of class glm is provided, the arguments x, y, and l will be ignored. We recommend using glm2 function from glm2 package since it provides better convergence while optimizing the likelihood to estimate coefficients of the model by IWLS method. It is required to return design matrix by x = TRUE in glm or glm2 function. For more information on how to do this, refer to the help documentation for the glm or glm2 function.

l

a character vector indicating the link function that should be used for Gamma family. Acceptable link functions for Gamma family are inverse, identity and log. For more details see Gamma from stats package.

discretize

If TRUE, the covariance function of W_{n}(u) process is evaluated at some data points (see ngrid and gridpit), and the integral equation is replaced by a matrix equation. If FALSE (the default value), the covariance function is first estimated, and then the integral equation is solved to find the eigenvalues. The results of our simulations recommend using the estimated covariance for solving the integral equation. The parameters ngrid, gridpit, and hessian are only relevant when discretize = TRUE.

ngrid

the number of equally spaced points to discretize the (0,1) interval for computing the covariance function.

gridpit

logical. If TRUE (the default value), the parameter ngrid is ignored and (0,1) interval is divided based on probability integral transforms or PITs obtained from the sample. If FALSE, the interval is divided into ngrid equally spaced points for computing the covariance function.

hessian

logical. If TRUE the Fisher information matrix is estimated by the observed Hessian Matrix based on the sample. If FALSE (the default value) the Fisher information matrix is estimated by the variance of the observed score matrix.

start.value

a numeric value or vector. This is the same as start argument in glm or glm2. The value is a starting point in iteratively reweighted least squares (IRLS) algorithm for estimating the MLE of coefficients in the model.

control

a list of parameters to control the fitting process in glm or glm2 function. For more details, see glm.control.

method

a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, 'both' to compute both cvm and ad statistics, and 'user' for custom weight function. See weight_function for details about custom weight function.

weight_function

a function representing the weight function w(u) used to compute the weighted Cramér-von Mises statistic when method = 'user'. The function must take a numeric vector u \in (0,1) as input and return a numeric vector of the same length. The statistic is computed as

T_n = n \int_{0}^{1} w^2(u) \left( F_n(u) - u \right)^2 du

where w^2(u) is computed internally by squaring the supplied function. The default value is NULL and when method is 'cvm', 'ad', or 'both' the weight_function is ignored. When method = 'user', this argument must be provided, otherwise an error is returned.

Value

A list of two containing the following components:

  • Statistic: the value of goodness-of-fit statistic.

  • p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic. If method = 'user' it returns the weighted statistic.

Examples

set.seed(123)
n <- 50
p <- 5
x <- matrix( rnorm(n*p, mean = 10, sd = 0.1), nrow = n, ncol = p)
b <- runif(p)
e <- rgamma(n, shape = 3)
y <- exp(x %*% b) * e
testGLMGamma(x, y, l = 'log')
myfit <- glm(y ~ x, family = Gamma('log'), x = TRUE, y = TRUE)
testGLMGamma(fit = myfit)
# Example for custom weight function
w_cvm <- function(u) rep(1, length(u))
testGLMGamma(fit = myfit, method = 'user', weight_function = w_cvm)


gofedf documentation built on April 12, 2026, 9:07 a.m.