cod: Goodness-of-fit measures for regression models In sjstats: Collection of Convenient Functions for Common Statistical Computations

Description

Compute Goodness-of-fit measures for various regression models, including mixed and Bayesian regression models.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```cod(x) r2(x, ...) ## S3 method for class 'lme' r2(x, n = NULL, ...) ## S3 method for class 'stanreg' r2(x, loo = FALSE, ...) ## S3 method for class 'brmsfit' r2(x, loo = FALSE, ...) ```

Arguments

 `x` Fitted model of class `lm`, `glm`, `merMod`, `glmmTMB`, `lme`, `plm`, `stanreg` or `brmsfit`. For method `cod()`, only a `glm` with binrary response. `...` Currently not used. `n` Optional, an `lme` object, representing the fitted null-model (unconditional model) to `x`. If `n` is given, the pseudo-r-squared for random intercept and random slope variances are computed (Kwok et al. 2008) as well as the Omega squared value (Xu 2003). See 'Examples' and 'Details'. `loo` Logical, if `TRUE` and `x` is a `stanreg` or `brmsfit` object, a LOO-adjusted r-squared is calculated. Else, a rather "unadjusted" r-squared will be returned by calling `rstantools::bayes_R2()`.

Details

For linear models, the r-squared and adjusted r-squared value is returned, as provided by the `summary`-function.

For mixed models (from lme4 or glmmTMB) marginal and conditional r-squared values are calculated, based on Nakagawa et al. 2017.

For `lme`-models, an r-squared approximation by computing the correlation between the fitted and observed values, as suggested by Byrnes (2008), is returned as well as a simplified version of the Omega-squared value (1 - (residual variance / response variance), Xu (2003), Nakagawa, Schielzeth 2013), unless `n` is specified.

If `n` is given, for `lme`-models pseudo r-squared measures based on the variances of random intercept (tau 00, between-group-variance) and random slope (tau 11, random-slope-variance), as well as the r-squared statistics as proposed by Snijders and Bosker 2012 and the Omega-squared value (1 - (residual variance full model / residual variance null model)) as suggested by Xu (2003) are returned.

For generalized linear models, Cox & Snell's and Nagelkerke's pseudo r-squared values are returned.

The ("unadjusted") r-squared value and its standard error for `brmsfit` or `stanreg` objects are robust measures, i.e. the median is used to compute r-squared, and the median absolute deviation as the measure of variability. If `loo = TRUE`, a LOO-adjusted r-squared is calculated, which comes conceptionally closer to an adjusted r-squared measure.

Value

For `r2()`, depending on the model, returns:

• For linear models, the r-squared and adjusted r-squared values.

• For mixed models, the marginal and conditional r-squared values.

• For `glm` objects, Cox & Snell's and Nagelkerke's pseudo r-squared values.

• For `brmsfit` or `stanreg` objects, the Bayesian version of r-squared is computed, calling `rstantools::bayes_R2()`.

• If `loo = TRUE`, for `brmsfit` or `stanreg` objects a LOO-adjusted version of r-squared is returned.

• Models that are not currently supported return `NULL`.

For `cod()`, returns the `D` Coefficient of Discrimination, also known as Tjur's R-squared value.

Note

cod()

This method calculates the Coefficient of Discrimination `D` for generalized linear (mixed) models for binary data. It is an alternative to other Pseudo-R-squared values like Nakelkerke's R2 or Cox-Snell R2. The Coefficient of Discrimination `D` can be read like any other (Pseudo-)R-squared value.

r2()

For mixed models, the marginal r-squared considers only the variance of the fixed effects, while the conditional r-squared takes both the fixed and random effects into account.

For `lme`-objects, if `n` is given, the Pseudo-R2 statistic is the proportion of explained variance in the random effect after adding co-variates or predictors to the model, or in short: the proportion of the explained variance in the random effect of the full (conditional) model `x` compared to the null (unconditional) model `n`.

The Omega-squared statistics, if `n` is given, is 1 - the proportion of the residual variance of the full model compared to the null model's residual variance, or in short: the the proportion of the residual variation explained by the covariates.

Alternative ways to assess the "goodness-of-fit" is to compare the ICC of the null model with the ICC of the full model (see `icc`).

References

• Bolker B et al. (2017): GLMM FAQ.

• Byrnes, J. 2008. Re: Coefficient of determination (R^2) when using lme() (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q2/000713.html)

• Kwok OM, Underhill AT, Berry JW, Luo W, Elliott TR, Yoon M. 2008. Analyzing Longitudinal Data with Multilevel Models: An Example with Individuals Living with Lower Extremity Intra-Articular Fractures. Rehabilitation Psychology 53(3): 370–86. doi: 10.1037/a0012765

• Nakagawa S, Schielzeth H. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2):133–142. doi: 10.1111/j.2041-210x.2012.00261.x

• Nakagawa S, Johnson P, Schielzeth H (2017) The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisted and expanded. J. R. Soc. Interface 14. doi: 10.1098/rsif.2017.0213

• Rabe-Hesketh S, Skrondal A. 2012. Multilevel and longitudinal modeling using Stata. 3rd ed. College Station, Tex: Stata Press Publication

• Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. 2nd ed. Thousand Oaks: Sage Publications

• Snijders TAB, Bosker RJ. 2012. Multilevel analysis: an introduction to basic and advanced multilevel modeling. 2nd ed. Los Angeles: Sage

• Xu, R. 2003. Measuring explained variation in linear mixed effects models. Statist. Med. 22:3527-3541. doi: 10.1002/sim.1572

• Tjur T. 2009. Coefficients of determination in logistic regression models - a new proposal: The coefficient of discrimination. The American Statistician, 63(4): 366-372

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```data(efc) # Tjur's R-squared value efc\$services <- ifelse(efc\$tot_sc_e > 0, 1, 0) fit <- glm(services ~ neg_c_7 + c161sex + e42dep, data = efc, family = binomial(link = "logit")) cod(fit) library(lme4) fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy) r2(fit) fit <- lm(barthtot ~ c160age + c12hour, data = efc) r2(fit) # Pseudo-R-squared values fit <- glm(services ~ neg_c_7 + c161sex + e42dep, data = efc, family = binomial(link = "logit")) r2(fit) ```

sjstats documentation built on Nov. 15, 2018, 5:04 p.m.