r2d: R-squared as Deviance Explained

r2dR Documentation

R-squared as Deviance Explained

Description

r2d calculates R-squared as the fraction of deviance explained, which equals traditional R-squared (fraction of variance explained) for linear regression models, but, unlike traditional R-squared, generalizes to exponential family regression models. With optional additional arguments, r2d can also returns a predictive R-squared for how well the model predicts responses for new observations and/or a cross-validated R-squared.

Usage

r2d(object, newdata = NULL, cv = FALSE, lambda = NULL, ...)

Arguments

object

Fitted model

newdata

An optional data frame in which to look for variables with which to calculate a predictive R-squared

cv

A switch indicating if a predictive R-squared should be estimated by cross-validation via a call to validate.

...

Additional arguments to be passed to validate.

Details

For standard linear regression models fit with lm, the familiar coefficient of determination, R-squared, can be obtained with summary.lm. However, R-squared is not provided for generalized linear models (GLMs) fit with glm, presumably because it can have undesirable properties when applied to GLMs with non-normal error distributions (e.g., binomial, Poisson, etc.). For such distributions, R-squared is no longer guaranteed to lie within the [0,1] interval or to uniformly increase as more predictors are added. Most importantly, the interpretation of R-squared as the fraction of uncertainty explained by the model does not generally hold for exponential family regression models.

Cameron and Windmeijer (1997) proposed an R-squared measure (termed R_{KL}^2) for GLMs based on Kullback-Leibler (KL) divergence (entropy - cross-entropy), which restores all of the desirable properties of R-squared, including its interpretation as the fraction of uncertainty explained. Owing to an equivalence between KL-divergence and deviance, others have referred to this metric as R_D^2 (Martin & Hall, 2016), or deviance explained. It is defined as 1-dev/nulldev, where dev is the deviance: 2*(loglik_sat - loglik_fit), where loglik_sat is the log-likelihood for the saturated model (a model with 1 free parameter per observation) and loglik_fit is the log- likelihood for the fitted model; and where nulldev is the null deviance: 2*(loglik_sat - loglik_null), where loglik_null is the log-likelihood for the intercept-only model. Following the reasoning of Martin & Hall (2016), the saturated and null models for zero-inflated (ZI) regression are equivalent to the saturated and null models for the corresponding non-ZI regression: e.g., the saturated model for Poisson regression defines the saturated model for ZI-Poisson regression.

Value

Object of class "R2" with the following items:

  1. R2fit

    an R-squared statistic for the model fit

  2. R2new

    if newdata was provided, an R-squared statistic for how well the model predicts new data

  3. R2cv

    if cv = TRUE, the cross-validated predictive R-squared returned by validate

References

Colin Cameron A, Windmeijer FAG. (1997) An R-squared measure of goodness of fit for some common nonlinear regression models. J Econometrics, 77, 329–342.

Jacob Martin & Daniel B. Hall (2016) R2 measures for zero-inflated regression models for count data with excess zeros, Journal of Statistical Computation and Simulation, 86:18, 3777-3790, DOI: 10.1080/00949655.2016.1186166

See Also

predict_metrics, validate


jashu/beset documentation built on April 20, 2023, 5:28 a.m.