R/dpit_main.R
In assessor: Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

Documented in dpit

#' DPIT residuals for regression models with various non-continuous outcomes
#'
#' Calculates DPIT residuals for regression models with non-continuous outcomes.
#' In particular, model assumptions for GLMs with discrete outcomes (e.g., binary, Poisson, and negative binomial), ordinal
#' regression models, zero-inflated regression models, and semicontinuous outcome
#' models can be assessed using \code{dpit()}.
#'
#' @usage dpit(model, plot=TRUE, scale="normal", line_args=list(), ...)
#' @param model A model object.
#' @param plot A logical value indicating whether or not to return QQ-plot
#' @param scale You can choose the scale of the residuals among `normal` and `uniform`.
#' The sample quantiles of the residuals are plotted against
#' the theoretical quantiles of a standard normal distribution under the normal scale,
#' and against the theoretical quantiles of a uniform (0,1) distribution under the uniform scale.
#'  The default scale is `normal`.
#' @param line_args A named list of graphical parameters passed to
#'   \code{graphics::abline()} to modify the reference (red) 45° line
#'   in the QQ plot. If left empty, a default red dashed line is drawn.
#' @param ... Additional graphical arguments passed to
#'   \code{stats::qqplot()} for customizing the QQ plot (e.g., \code{pch},
#'   \code{col}, \code{cex}, \code{xlab}, \code{ylab}).
#'
#' @details
#' This function determines the appropriate computation based on the class of
#' \code{model}. The supported model objects and outcome types are listed below.
#'
#' In addition to the class-based interface, the package also provides
#' distribution-specific DPIT calculators. If a fitted model comes from a
#' different class but has a supported outcome distribution, users can call
#' the corresponding distribution-based function directly.
#' For instance, for a regression model with Poisson outcomes,
#' one can use dpit to calculate the residuals if
#' the model is fit using `glm` function, or to use `dpit_pois` upon supplying fitted mean values.
#'
#' \itemize{
#' \item \strong{Discrete outcomes}
#'   \itemize{
#'   \item \code{\link[stats]{glm}} with \code{family = binomial()} (see \code{\link{dpit_bin}}).
#'   \item \code{\link[stats]{glm}} with \code{family = poisson()}
#'         (see \code{\link{dpit_pois}}).
#'   \item \code{\link[MASS]{glm.nb}} for negative binomial regression
#'         (see \code{\link{dpit_nb}}).
#'   \item \code{\link[MASS]{polr}} for ordinal outcomes
#'         (see \code{\link{dpit_ordi}}).
#'   }
#'
#' \item \strong{Zero-inflated discrete outcomes}
#'   \itemize{
#'   \item \code{\link[pscl]{zeroinfl}} with \code{dist = "poisson"}
#'         (see \code{\link{dpit_zpois}}).
#'   \item \code{\link[pscl]{zeroinfl}} with \code{dist = "negbin"}
#'         (see \code{\link{dpit_znb}}).
#'   }
#'
#' \item \strong{Semicontinuous outcomes}
#'   \itemize{
#'   \item Tobit regression via \code{\link[AER]{tobit}} from `AER` or \code{\link[VGAM]{vglm}} from `VGAM`
#'         (see \code{\link{dpit_tobit}}).
#'   \item Tweedie regression via \code{\link[stats]{glm}} with a Tweedie family
#'         (see \code{\link{dpit_tweedie}}).
#'   }
#' }
#'
#' \strong{Formulation for Discrete and Zero-Inflated Outcomes:}
#' \cr
#' The DPIT residual for the \eqn{i}th observation is defined as follows:
#' \deqn{\hat{r}(Y_i|X_i) = \hat{G}\bigg(\hat{F}(Y_i|\mathbf{X}_i)\bigg)}
#' where
#' \deqn{\hat{G}(s) = \frac{1}{n-1}\sum_{j=1, j \neq i}^{n}\hat{F}\bigg(\hat{F}^{(-1)}(\mathbf{X}_j)\bigg|\mathbf{X}_j\bigg)}
#' and \eqn{\hat{F}} refers to the fitted cumulative distribution function.
#' When `scale="uniform"`, DPIT residuals should closely follow a uniform distribution, otherwise it implies model deficiency.
#' When `scale="normal"`, it applies the normal quantile transformation to the DPIT residuals
#' \deqn{\Phi^{-1}\left[\hat{r}(Y_i|\mathbf{X}_i)\right],i=1,\ldots,n.} The null pattern is the standard normal distribution in this case.
#' \cr
#'
#' \strong{Formulation for Semicontinuous Outcomes:}
#' \cr
#' The DPIT residuals for regression models with semi-continuous outcomes are \deqn{\hat{r}_i=\frac{\hat{F}(Y_i|\mathbf{X}_i)}{n}\sum_{j=1}^n1\left(\hat{p}_0(\mathbf{X}_j)\leq \hat{F}(Y_i|\mathbf{X}_i)\right), i=1,\ldots,n,}
#' where \eqn{\hat{p}_0(\mathbf{X}_i)} is the fitted probability of zero, and \eqn{\hat{F}(\cdot|\mathbf{X}_i)} is the  fitted cumulative distribution function for the \eqn{i}th observation. Furthermore, \deqn{\hat{F}(y|\mathbf{x})=\hat{p}_0(\mathbf{x})+\left(1-\hat{p}_0(\mathbf{x})\right)\hat{G}(y|\mathbf{x})}
#' where \eqn{\hat{G}} is the fitted cumulative distribution for the positive data.
#'
#' @returns DPIT residuals. If `plot=TRUE`, also produces a QQ plot.
#'
#'
#' @import stats
#' @import graphics
#' @export
#'
#'
#' @references
#' L. Yang. Double probability integral transform residuals for regression models with discrete outcomes. Journal of Computational and Graphical Statistics, 33(3), pp.787–803, 2024. \cr
#' L. Yang. Diagnostics for regression models with semicontinuous outcomes. Biometrics, 80(1), ujae007, 2024.
#'
#' @examples
#' library(MASS)
#' n <- 500
#' set.seed(1234)
#' ## Negative Binomial example
#' # Covariates
#' x1 <- rnorm(n)
#' x2 <- rbinom(n, 1, 0.7)
#' ### Parameters
#' beta0 <- -2
#' beta1 <- 2
#' beta2 <- 1
#' size1 <- 2
#' lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
#' # generate outcomes
#' y <- rnbinom(n, mu = lambda1, size = size1)
#'
#' # True model
#' model1 <- glm.nb(y ~ x1 + x2)
#' resid.nb1 <- dpit(model1, plot = TRUE, scale = "uniform")
#'
#' # Overdispersion
#' model2 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
#' resid.nb2 <- dpit(model2, plot = TRUE, scale = "normal")
#'
#' ## Binary example
#' n <- 500
#' set.seed(1234)
#' # Covariates
#' x1 <- rnorm(n, 1, 1)
#' x2 <- rbinom(n, 1, 0.7)
#' # Coefficients
#' beta0 <- -5
#' beta1 <- 2
#' beta2 <- 1
#' beta3 <- 3
#' q1 <- 1 / (1 + exp(beta0 + beta1 * x1 + beta2 * x2 + beta3 * x1 * x2))
#' y1 <- rbinom(n, size = 1, prob = 1 - q1)
#'
#' # True model
#' model01 <- glm(y1 ~ x1 * x2, family = binomial(link = "logit"))
#' resid.bin1 <- dpit(model01, plot = TRUE)
#'
#' # Missing covariates
#' model02 <- glm(y1 ~ x1, family = binomial(link = "logit"))
#' resid.bin2 <- dpit(model02, plot = TRUE)
#'
#' ## Poisson example
#' n <- 500
#' set.seed(1234)
#' # Covariates
#' x1 <- rnorm(n)
#' x2 <- rbinom(n, 1, 0.7)
#' # Coefficients
#' beta0 <- -2
#' beta1 <- 2
#' beta2 <- 1
#' lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
#' y <- rpois(n, lambda1)
#'
#' # True model
#' poismodel1 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
#' resid.poi1 <- dpit(poismodel1, plot = TRUE)
#'
#' # Enlarge three outcomes
#' y <- rpois(n, lambda1) + c(rep(0, (n - 3)), c(10, 15, 20))
#' poismodel2 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
#' resid.poi2 <- dpit(poismodel2, plot = TRUE)
#'
#' ## Ordinal example
#' n <- 500
#' set.seed(1234)
#' # Covariates
#' x1 <- rnorm(n, mean = 2)
#' # Coefficient
#' beta1 <- 3
#'
#' # True model
#' p0 <- plogis(1, location = beta1 * x1)
#' p1 <- plogis(4, location = beta1 * x1) - p0
#' p2 <- 1 - p0 - p1
#' genemult <- function(p) {
#'   rmultinom(1, size = 1, prob = c(p[1], p[2], p[3]))
#' }
#' test <- apply(cbind(p0, p1, p2), 1, genemult)
#' y1 <- rep(0, n)
#' y1[which(test[1, ] == 1)] <- 0
#' y1[which(test[2, ] == 1)] <- 1
#' y1[which(test[3, ] == 1)] <- 2
#' multimodel <- polr(as.factor(y1) ~ x1, method = "logistic")
#' resid.ord1 <- dpit(multimodel, plot = TRUE)
#'
#' ## Non-Proportionality
#' n <- 500
#' set.seed(1234)
#' x1 <- rnorm(n, mean = 2)
#' beta1 <- 3
#' beta2 <- 1
#' p0 <- plogis(1, location = beta1 * x1)
#' p1 <- plogis(4, location = beta2 * x1) - p0
#' p2 <- 1 - p0 - p1
#' genemult <- function(p) {
#'   rmultinom(1, size = 1, prob = c(p[1], p[2], p[3]))
#' }
#' test <- apply(cbind(p0, p1, p2), 1, genemult)
#' y1 <- rep(0, n)
#' y1[which(test[1, ] == 1)] <- 0
#' y1[which(test[2, ] == 1)] <- 1
#' y1[which(test[3, ] == 1)] <- 2
#' multimodel <- polr(as.factor(y1) ~ x1, method = "logistic")
#' resid.ord2 <- dpit(multimodel, plot = TRUE)
dpit <- function(model,
                 plot = TRUE,
                 scale = "normal",
                 line_args = list(),
                 ...) {
  UseMethod("dpit")
}

Any scripts or data that you put into this service are public.

assessor documentation built on March 23, 2026, 1:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

assessor
Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

R/dpit_main.R
In assessor: Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

Defines functions dpit

Documented in dpit

Try the assessor package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

assessor Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

R/dpit_main.R In assessor: Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

Defines functions dpit

Documented in dpit

Try the assessor package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

assessor
Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes

R/dpit_main.R
In assessor: Assessment Tools for Regression Models with Discrete and Semicontinuous Outcomes