goftest | R Documentation |

A goodness-of-fit test is performed in the case projected statistics have been used for inference. Otherwise some plots of limited interest are produced.

goftest(object, nsim = 99L, method = "", stats=NULL, plot. = TRUE, nb_cores = NULL, Simulate = attr(object$logLs, "Simulate"), packages = attr(object$logLs, "packages"), env = attr(object$logLs, "env"), verbose = interactive(), cl_seed=.update_seed(object), get_gof_stats=.get_gof_stats)

`object` |
an |

`nsim` |
Number of draws of summary statistics. |

`method` |
For development purposes, not documented. |

`stats` |
Character vector, or NULL: the set of summary statistics to be used to construct the test. If NULL, the union, across all projections, of the raw summary statistics used for projections is potentially used for goodness of fit; however, if this set is too large for gaussian mixture modelling, a subset of variable may be selected. How they are selected is not yet fully settled (see Details). |

`plot.` |
Control diagnostic plots. |

`nb_cores, Simulate, packages, env, verbose` |
See same-named |

`cl_seed` |
NULL or integer (see |

`get_gof_stats` |
function for selecting raw statistics (see Details). |

**Testing goodness-of-fit:** The test is somewhat heuristic but appears to give reasonable results (the Example shows how this can be verified). It assumes that all summary statistics are reduced to projections predicting all model parameters. It is then conceived as if any projection *p* predicting a parameter were a sufficient statistic for this parameter, given the information contained in the summary statistics **s** (this is certainly the ideal objective of machine-learning regression methods). Then a statistic *u* independent (under the fitted model) from all projections should be a suitable statistic for testing goodness of fit: if the model is correctly specified, the quantile of observed *u*, in the distribution of *u* under the fitted model, should be uniformly distributed over repeated sampling under the data-generating process. The procedure constructs statistics uncorrelated to all **p** (over repeated sampling under the fitted model) and proceeds as if they were independent from *p* (rather than simply uncorrelated). A number (depending on the size of the reference table) of statistics *u* uncorrelated to *p* are then defined. Each such statistic is obtained as the residual of the regression of a given raw summary statistic to all projections, where the regression input is a simulation table of `nsim`

replicates of **s** under the fitted model, and of their projections **p** (using the “projectors” constructed from the full reference table). The latter regression involves one more, small-`nsim`

, approximation (as it is the sample correlation that is zeroed) but using the residuals is crucially better than using the original summary statistics (as some ABC software may do). An additional feature of the procedure is to construct a single test statistic *t* from joint residuals **u**, by estimating their joint distribution (using Gaussian mixture modelling) and letting *t* be the density of **u** in this distribution.

**Selection of raw summary statistics:** See the code of the `Infusion:::..get_gof_stats`

function for the method used. It requires that `ranger`

has been used to produce the projectors, and that the latter include variable importance statistics (by default, Infusion calls `ranger`

with argument `importance="permutation"`

). `.get_gof_stats`

then selects the raw summary statistics with *least* importance over projections (this may not be optimal, and in particular appears redundant with the procedure described below to construct goodness-of-fit statistics from raw summary statistics; so this might change in a later version), and returns a vector of names of raw statistics, sorted by increasing least-importance. The number of summary statistics can be controlled by the global package option `gof_nstats_fn`

, a function with arguments `nr`

and `nstats`

for, respectively, the number of simulations of the processus (as controlled by `goftest(.,nsim)`

) and the total number of raw summary statistics used in the projections.

The **diagnostic plot** will show a data frame of residuals *u* of the summary statistics identified as the first elements of the vector returned by `Infusion:::..get_gof_stats`

, i.e. again a set of raw statistics with least-importance over projectors.

A list with currently a single element

`pval ` |
The p-value of the test (NULL if the test is not feasible). |

### See end of example("example_reftable") for minimal example. ## Not run: ### Performance of GoF test over replicate draws from data-generating process # First, run example("example_reftable") # (at least up to the final 'slik_j' object), then # as a shortcut, the same projections will be used in all replicates: dprojectors <- slik_j$projectors set.seed(123) gof_draws <- replicate(200, { cat(" ") dSobs <- blurred(mu=4,s2=1,sample.size=40) ## ----Inference workflow----------------------------------------------- dprojSobs <- project(dSobs,projectors=dprojectors) dslik <- infer_SLik_joint(dprojSimuls,stat.obs=dprojSobs,verbose=FALSE) dslik <- MSL(dslik, verbose=FALSE, eval_RMSEs=FALSE) ## ----GoF test----------------------------------------------- gof <- goftest(dslik,nb_cores = 1L, plot.=FALSE,verbose=FALSE) cat(unlist(gof)) gof }) # ~ uniform distribution under correctly-specified model: plot(ecdf(unlist(gof_draws))) ## End(Not run)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.