r2d | R Documentation |
r2d
calculates R-squared as the fraction of deviance explained, which
equals traditional R-squared (fraction of variance explained) for linear
regression models, but, unlike traditional R-squared, generalizes to
exponential family regression models. With optional additional arguments,
r2d
can also returns a predictive R-squared for how well the model
predicts responses for new observations and/or a cross-validated R-squared.
r2d(object, newdata = NULL, cv = FALSE, lambda = NULL, ...)
object |
Fitted model |
newdata |
An optional data frame in which to look for variables with which to calculate a predictive R-squared |
cv |
A switch indicating if a predictive R-squared should be estimated
by cross-validation via a call to |
... |
Additional arguments to be passed to |
For standard linear regression models fit with lm
, the
familiar coefficient of determination, R-squared, can be obtained with
summary.lm
. However, R-squared is not provided for
generalized linear models (GLMs) fit with glm
,
presumably because it can have undesirable properties when applied to GLMs
with non-normal error distributions (e.g., binomial, Poisson, etc.). For
such distributions, R-squared is no longer guaranteed to lie within the [0,1]
interval or to uniformly increase as more predictors are added. Most
importantly, the interpretation of R-squared as the fraction of uncertainty
explained by the model does not generally hold for exponential family
regression models.
Cameron and Windmeijer (1997) proposed an R-squared measure (termed
R_{KL}^2
) for GLMs based on Kullback-Leibler (KL) divergence
(entropy - cross-entropy
), which restores all of the desirable
properties of R-squared, including its interpretation as the fraction of
uncertainty explained. Owing to an equivalence between KL-divergence and
deviance, others have referred to this metric as R_D^2
(Martin & Hall,
2016), or deviance explained. It is defined as 1-dev/nulldev
, where
dev
is the deviance: 2*(loglik_sat - loglik_fit)
, where
loglik_sat
is the log-likelihood for the saturated model (a model
with 1 free parameter per observation) and loglik_fit
is the log-
likelihood for the fitted model; and where nulldev
is the null
deviance: 2*(loglik_sat - loglik_null)
, where loglik_null
is the
log-likelihood for the intercept-only model. Following the reasoning of
Martin & Hall (2016), the saturated and null models for zero-inflated (ZI)
regression are equivalent to the saturated and null models for the
corresponding non-ZI regression: e.g., the saturated model for Poisson
regression defines the saturated model for ZI-Poisson regression.
Object of class "R2" with the following items:
an R-squared statistic for the model fit
if newdata
was provided, an R-squared
statistic for how well the model predicts new data
if cv = TRUE
, the cross-validated predictive R-squared
returned by validate
Colin Cameron A, Windmeijer FAG. (1997) An R-squared measure of goodness of fit for some common nonlinear regression models. J Econometrics, 77, 329–342.
Jacob Martin & Daniel B. Hall (2016) R2 measures for zero-inflated regression models for count data with excess zeros, Journal of Statistical Computation and Simulation, 86:18, 3777-3790, DOI: 10.1080/00949655.2016.1186166
predict_metrics
, validate
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.