R2_pred: Calculate R2_pred

View source: R/r2_pred.R

R2_predR Documentation

Calculate R2_pred

Description

Calculate partial and total R2s for LMM, GLMM, PGLS, and PGLMM using R2_pred, an R2 based on the variance of the difference between the observed and predicted values of a fitted model.

Usage

R2_pred(mod = NULL, mod.r = NULL, gaussian.pred = "tip_rm", phy = NULL)

Arguments

mod

A regression model with one of the following classes: 'lm', 'glm', 'lmerMod', 'glmerMod', 'phylolm', 'gls', 'pglmm', pglmm_compare', 'binaryPGLMM', or 'communityPGLMM'.

mod.r

A reduced model; if not provided, the total R2 will be given by setting 'mod.r' to the model corresponding to 'mod' with the intercept as the only predictor.

gaussian.pred

For models of classes 'pglmm' and 'pglmm_compare' when family = gaussian, which type of prediction to calculate.

phy

The phylogeny for phylogenetic models (as a 'phylo' object), which must be specified for models of class 'phylolm'.

Details

R2_pred works with classes 'lm', 'glm', 'lmerMod', 'glmerMod', 'phylolm', 'phyloglm', 'gls', 'pglmm', 'pglmm_compare', binaryPGLMM', and 'communityPGLMM' (family = gaussian and binomial).

LMM (lmerMod), GLMM (glmerMod), PGLMM (pglmm, pglmm_compare, binaryPGLMM and communityPGLMM):

partial R2 = 1 - var(y - y.fitted.f)/var(y - y.fitted.r)

where y are the observed data, and y.fitted.f and y.fitted.r are the fitted (predicted) values from the full and reduced models. For GLMMs and PGLMMs, the values of y.fitted are in the space of the raw data (as opposed to the 'Normal' or 'latent' space). When the reduced model 'mod.r' is not specified, the total R2 is computing using the reduced model with only the intercept.

For pglmm and pglmm_compare with gaussian models, the default method for computing predicted values is "nearest_node" to correspond to predicted values in lmer, although the method "tip_rm" can be specified to correspond to the analyses in Ives (2018).

Note that the version of binaryPGLMM() in the package ape is replaced by a version contained within rr2 that outputs all of the required information for the calculation of R2_resid.

PGLS (phylolm and gls):

For PGLS, the total R2_pred is computed by removing each datum one at a time, predicting its value from the fitted model, repeating this for all data points, and then calculating the variance of the difference between observed and fitted values. The predictions are calculated as

res.predicted[j] = V[j, -j] solve(V[-j, -j]) res[-j]

where res[-j] is a vector of residuals with datum j removed, V[-j,-j] is the phylogenetic covariance matrix with row and column j removed, and V[j, -j] is row j of covariance matrix V with element j removed. The partial R2_pred is calculated from the total R2_pred from full and reduced models as

partial R2 = 1 - (1 - R2_pred.f)/(1 - R2_pred.r)

Note that phylolm() can have difficulties in finding solutions when there is no phylogenetic signal; when the estimate indicates no phylogenetic signal, you should refit the model with the corresponding LM.

LM (lm) and GLM (glm):

For compatibility and generating reduced models, rr2 will compute R2_pred for LM and GLM that correspond to LMM/PGLS and GLMM/PGLMM.

Author(s)

Anthony R. Ives

References

Ives A.R. and Li D. 2018. rr2: An R package to calculate R2s for regression models. Journal of Open Source Software. DOI:10.21105/joss.01028

Ives A.R. 2018. R2s for Correlated Data: Phylogenetic Models, LMMs, and GLMMs. Systematic Biology, Volume 68, Issue 2, March 2019, Pages 234-251. DOI:10.1093/sysbio/syy060

See Also

MuMIn, lme4, ape, phylolm, pez

Examples

library(ape)
library(phylolm)
library(lme4)
library(nlme)
library(phyr)

set.seed(12345)
p1 <- 10
nsample <- 10
n <- p1 * nsample

d <- data.frame(x1 = 0, x2 = 0, u1 = rep(1:p1, each = nsample),
                u2 = rep(1:p1, times = nsample))
d$u1 <- as.factor(d$u1)
d$u2 <- as.factor(d$u2)

b1 <- 1
b2 <- -1
sd1 <- 1.5

d$x1 <- rnorm(n = n)
d$x2 <- rnorm(n = n)
d$y.lmm <- b1 * d$x1 + b2 * d$x2 + 
  rep(rnorm(n = p1, sd = sd1), each = nsample) +
  rep(rnorm(n = p1, sd = sd1), times = nsample) + 
  rnorm(n = n)

prob <- inv.logit(b1 * d$x1 + rep(rnorm(n = p1, sd = sd1), each = nsample))
d$y.glmm <- rbinom(n = n, size = 1, prob = prob)

# LMM with two fixed and two random effects ----
z.f <- lmer(y.lmm ~ x1 + x2 + (1 | u1) + (1 | u2), data = d, REML = FALSE)
z.x <- lmer(y.lmm ~ x1 + (1 | u1) + (1 | u2), data = d, REML = FALSE)
z.v <- lmer(y.lmm ~ 1 + (1 | u2), data = d, REML = FALSE)
z.0 <- lm(y.lmm ~ 1, data = d)

R2_pred(z.f, z.x)
R2_pred(z.f, z.v)
R2_pred(z.f)

# GLMM with one fixed and one random effect ----
z.f <- glmer(y.glmm ~ x1 + (1 | u1), data = d, family = 'binomial')
z.x <- glmer(y.glmm ~ 1 + (1 | u1), data = d, family = 'binomial')
z.v <- glm(y.glmm ~ x1, data = d, family = 'binomial')

R2_pred(z.f, z.x)
R2_pred(z.f, z.v)
R2_pred(z.f)

# PGLS with a single fixed effect ----
n <- 100
d <- data.frame(x = array(0, dim = n), y = 0)

b1 <- 1.5
signal <- 0.7

phy <- compute.brlen(rtree(n = n), method = 'Grafen', power = 1)
phy.x <- compute.brlen(phy, method = 'Grafen', power = .0001)

# Generate random data
x <- rTraitCont(phy.x, model = 'BM', sigma = 1)
e <- signal ^ 0.5 * rTraitCont(phy, model = 'BM', sigma = 1) +
  (1 - signal) ^ 0.5 * rnorm(n = n)
d$x <- x[match(names(e), names(x))]
d$y <- b1 * x + e
rownames(d) <- phy$tip.label
d$sp <- phy$tip.label

z.x <- gls(y ~ 1, data = d, correlation = corPagel(1, phy, form = ~sp), method = "ML")
z.f <- gls(y ~ x, data = d, correlation = corPagel(1, phy, form = ~sp), method = "ML")
z.v <- lm(y ~ x, data = d)

R2_pred(z.f, z.x)
R2_pred(z.f, z.v)
R2_pred(z.f)

# PGLMM with one fixed effect ----
n <- 100
b1 <- 1.5
signal <- 2

phy <- compute.brlen(rtree(n = n), method = 'Grafen', power = 1)
phy.x <- compute.brlen(phy, method = 'Grafen', power = .0001)

# Generate random data
x <- rnorm(n)
d <- data.frame(x = x, y = 0)

e <- signal * rTraitCont(phy, model = 'BM', sigma = 1)
e <- e[match(phy$tip.label, names(e))]

d$y <- rbinom(n = n, size = 1, prob = inv.logit(b1 * d$x + e))
rownames(d) <- phy$tip.label

z.f <- pglmm_compare(y ~ x, data = d, family = "binomial", phy = phy)
z.x <- pglmm_compare(y ~ 1, data = d, family = "binomial", phy = phy)
z.v <- glm(y ~ x, data = d, family = "binomial")

R2_pred(z.f, z.x)
R2_pred(z.f, z.v)
R2_pred(z.f)

#' #################
# A community example of pglmm {phyr} contrasting R2_pred when bayes = TRUE and bayes = F

library(mvtnorm)
nspp <- 6
nsite <- 4

# Simulate a phylogeny that has a lot of phylogenetic signal (power = 1.3)
phy <- compute.brlen(rtree(n = nspp), method = "Grafen", power = 1.3)

# Simulate species means
sd.sp <- 1
mean.sp <- rTraitCont(phy, model = "BM", sigma=sd.sp^2)

# Replicate values of mean.sp over sites
Y.sp <- rep(mean.sp, times=nsite)

# Simulate site means
sd.site <- 1
mean.site <- rnorm(nsite, sd=sd.site)

# Replicate values of mean.site over sp
Y.site <- rep(mean.site, each=nspp)

# Compute a covariance matrix for phylogenetic attraction
sd.attract <- 1
Vphy <- vcv(phy)

# Standardize the phylogenetic covariance matrix to have determinant = 1.
# (For an explanation of this standardization, see subsection 4.3.1 in Ives (2018))
Vphy <- Vphy/(det(Vphy)^(1/nspp))

# Construct the overall covariance matrix for phylogenetic attraction.
# (For an explanation of Kronecker products, see subsection 4.3.1 in the book)
V <- kronecker(diag(nrow = nsite, ncol = nsite), Vphy)
Y.attract <- array(t(rmvnorm(n = 1, sigma = sd.attract^2*V)))

# Simulate residual errors
sd.e <- 1
Y.e <- rnorm(nspp*nsite, sd = sd.e)

# Construct the dataset
d <- data.frame(sp = rep(phy$tip.label, times = nsite), site = rep(1:nsite, each = nspp))

# Simulate abundance data
d$Y <- Y.sp + Y.site + Y.attract + Y.e

# Full and reduced models
z.f <- pglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site),
        data = d, cov_ranef = list(sp = phy), REML = FALSE)
z.nested <- pglmm(Y ~ 1 + (1|sp__) + (1|site),
        data = d, cov_ranef = list(sp = phy), REML = FALSE)
z.sp <- pglmm(Y ~ 1 + (1|sp) + (1|site),
        data = d, cov_ranef = list(sp = phy), REML = FALSE)

R2_pred(z.f, z.nested)
R2_pred(z.nested, z.sp)
R2_pred(z.f)

# vector - matrix
# These are generally larger when gaussian.pred = "nearest_node"
R2_pred(z.f, z.nested, gaussian.pred = "nearest_node")
R2_pred(z.nested, z.sp, gaussian.pred = "nearest_node")
R2_pred(z.f, gaussian.pred = "nearest_node")

# # When bayes = TRUE, gaussian.pred does not work.
# # Commented out because INLA is not on CRAN
# z.f.bayes <- pglmm(Y ~ 1 + (1|sp__) + (1|site) + (1|sp__@site),
#                data = d, cov_ranef = list(sp = phy), bayes = TRUE)
# z.nested.bayes <- pglmm(Y ~ 1 + (1|sp__) + (1|site),
#                data = d, cov_ranef = list(sp = phy), bayes = TRUE)
# z.sp.bayes <- pglmm(Y ~ 1 + (1|sp) + (1|site),
#                data = d, cov_ranef = list(sp = phy), bayes = TRUE)
# 
# R2_pred(z.f.bayes, z.nested.bayes)
# R2_pred(z.nested.bayes, z.sp.bayes)
# R2_pred(z.f.bayes)

arives/rr2 documentation built on Aug. 20, 2023, 3:24 a.m.