mahalanobis_arbutus: Multivariate measure of model adequacy
In mwpennell/arbutus: Evaluate the adequacy of continuous trait models

mahalanobis_arbutus

R Documentation

Multivariate measure of model adequacy

Description

Computes Mahalanobis distance between the observed test statistics and the simulated test statistics as a multivariate measure of model fit

Usage

mahalanobis_arbutus(x)

Arguments

`x`	an `arbutus` object inherited from `compare_pic_stat`

Details

This function computes the Mahalanobis distance between the observed and simulated test statistics. The Mahalanobis distance (see mahalanobis is a unit-less and scale-invariant of the distance between a single data point (our observed test statistics) and a common point (here the mean of the simulated test statistics), taking into account the covariance between the test statistics from the simulated data. It assumes that the distribution of test statistics is multivariate normal. For the default test statistics (see default_pic_stat, this condition should be met – the exception being the pic_stat_dcdf statistic, which as it is bounded at 0, will not be. As a result, if pic_stat_dcdf is included in the set of test statistic, this function will take the natural log of the values before computing the Mahalanobis distance. All other test statistics will be assumed to be normally distributed and used as is.

While the Mahalanobis distance may be a useful test measure in some circumstances, we recommend checking the test statistics individually for a number of reasons. First, our procedure for calculating the p-values for the test statistics is general and does not depend on assumptions regarding the distribution of values. Second, the interpretation from the individual p-values is much more clear from the perspective of either posterior predictive or parametric bootstrapping theory. Third, and most importantly, the fact that some test statistics capture the variation in the data better than others provides useful information as to how and why the model is inadequate.

If only one set of observed test statistics are available (e.g. from fitting a model using maximum likelihood to a single tree), a single distance will be returned. If multiple sets of test statistics are available (e.g. from fitting a model using a Bayesian MCMC), the function will return a distribution of distances.

Value

the Mahalanobis distance between the observed and simulated test statistic.

Examples

data(finch)
phy <- finch$phy
dat <- finch$data[,"wingL"]
unit.tree <- make_unit_tree(phy, data=dat)

## calculate default test stats on observed data
obs <- calculate_pic_stat(unit.tree, stats=NULL)

## simulate data on unit.tree
sim.dat <- simulate_char_unit(unit.tree, nsim=10)

## calculate default test stats on simulated data
sim <- calculate_pic_stat(sim.dat, stats=NULL)

## compare simulated to observed test statistics
res <- compare_pic_stat(obs, sim)

## calculate Mahalanobis distance
mahalanobis_arbutus(res)

mwpennell/arbutus documentation built on Oct. 6, 2022, 10 a.m.