PPC.residuals: Calculate Residuals from MCMC Output of Occupancy Models

View source: R/PPC.residuals.R

PPC.residualsR Documentation

Calculate Residuals from MCMC Output of Occupancy Models

Description

Posterior Predictive Check (PPC) function that calculates Freeman-Tukey (FT) residuals, Pearson"s Chi-squared residuals, or deviance from MCMC output of occupancy models. This function compares observed data to simulated data from the posterior distribution to assess model fit.

Usage

PPC.residuals(
  y,
  p,
  psi,
  model = c("Occupancy", "RN"),
  type = c("FT", "PearChi2", "Deviance"),
  K = NULL,
  z.cond = TRUE,
  zhat = NULL,
  nmax = 20,
  return.residuals = TRUE,
  return.z = TRUE
)

Arguments

y

Observations as either a site vector or site by occasion matrix. For matrix format, use NA for unsampled occasions.

p

Array of posterior samples for detection probability (p). Dimensions should be iterations by sites (by occasion optionally). For RN models, p should represent individual-level detection (not conditional on local abundance).

psi

Array of posterior samples for occupancy probability (psi). Dimensions should be iterations by sites. For RN models, psi should represent expected abundance

model

Character indicating model type: either "Occupancy" or "RN" (Royle-Nichols).

type

Type of residual to calculate: "FT" (Freeman Tukey), "PearChi2" (Pearson Chi-squared), or "Deviance" (not technically a residual).

K

Number of occasions as either a scalar or site vector. Calculated automatically if y is a matrix.

z.cond

Logical. If TRUE, new data is conditioned on estimated z (testing only detection model fit). If FALSE, generates new z for each posterior sample (testing complete model).

zhat

Optional matrix with same dimensions as psi containing estimates of z from the same model. If not provided, will be generated internally.

nmax

Maximum site-level abundance (default = 20). Only used if model="RN". Higher values increase computation time. Warning given if set too low.

return.residuals

Logical. If TRUE (default), returns residuals along with Bayesian p-value.

return.z

Logical. If TRUE, returns z values conditional on y, and unconditional z's if z.cond = FALSE. Note: if zhat is provided, the returned conditional-on-y z values will be identical to those provided.

Details

This function helps assess model fit for occupancy models using various types of residuals:

Types of Residuals

  • Freeman-Tukey (FT):

    R_j = (\sqrt{y_j} - \sqrt{E(y_j)})^2

    Measures the squared difference between the square root of observed detections and the square root of expected detections at each site.

  • Pearson Chi-squared:

    R_j = \left(\frac{y_j - E(y_j)}{\sqrt{Var(y_j)}}\right)^2

    Measures the squared difference between observed and expected detections, standardized by the theoretical variance calculated from the model parameters.

  • Deviance:

    R_j = -2\log[y_j|\theta_j, K_j]

    Measures the contribution of each site to the overall model likelihood, quantifying the discrepancy between observed data and model predictions based on likelihood ratios

Where:

  • y_j is the number of detections of the species at site j, out of K_j repeated surveys

  • E(y_j) = K_j p_j z_j, with p_j = species detection probability and z_j = occupancy state (1 if occupied, 0 otherwise)

  • Var(y_j) = p_j z_j (1 - p_j z_j) K_j

  • For Royle-Nichols occupancy models, the term p_j z_j is replaced with 1 - (1 - r_j)^{N_j}, where r_j = individual detection probability and N_j = local abundance

  • For Deviance, \theta_j is either occupancy and species detection probability at site j (\psi_j, p_j) for regular occupancy models, or expected abundance and individual detection probability (\lambda_j, r_j) for Royle-Nichols occupancy models

Bayesian P-values

The function calculates Bayesian p-values as a measure of model fit. These values:

  • Range from 0 to 1

  • Values close to 0.5 suggest good model fit

  • Values close to 0 or 1 suggest poor fit

  • Are calculated by comparing observed residuals to residuals from simulated data

Conditional vs Unconditional Assessment

The z.cond parameter allows for two types of model assessment:

  • z.cond = TRUE: Tests only the detection component of the model, fixing occupancy/abundance to estimates from the model, rather than generating them anew

  • z.cond = FALSE: Tests the complete model, including both occupancy and detection components

Value

If return.residuals=TRUE (default), returns a list containing:

  • res.obs - residuals for observed data

  • res.new - residuals for newly generated data

  • BP - Bayesian p-value

If return.residuals=FALSE, returns only the Bayesian p-value.

Warning

This is a beta version of the function. While it has been tested extensively, not all possible data configurations may have been captured in testing. This is particularly true for:

  • Deviance calculations (type = "Deviance")

  • Royle-Nichols models (model = "RN")

If you encounter issues with the function, please contact the package developers.

Note

FT and Chi-squared residuals have been extensively tested. Deviance calculations have undergone less testing and are only available for scenarios with constant detection probability across occasions. FT and Chi-squared residuals can handle varying detection probabilities.

Author(s)

Rahel Sollmann

References

Sollmann, Rahel. Occupancy models and the "good fit, bad prediction" dilemma. Ecology (submitted)


camtrapR documentation built on Nov. 5, 2025, 6:54 p.m.