goftest | R Documentation |
A goodness-of-fit test is performed in the case projected statistics have been used for inference. Otherwise some plots of limited interest are produced.
summary
and print
methods for results of goftest
call str
to display the structure of this result.
goftest(object, nsim = 99L, method = "", stats=NULL, plot. = TRUE, nb_cores = NULL,
Simulate = get_from(object,"Simulate"),
control.Simulate=get_from(object,"control.Simulate"),
packages = get_from(object,"packages"),
env = get_from(object,"env"), verbose = interactive(),
cl_seed=.update_seed(object), get_gof_stats=.get_gof_stats)
object |
an |
nsim |
Number of draws of summary statistics. |
method |
For development purposes, not documented. |
stats |
Character vector, or NULL: the set of summary statistics to be used to construct the test. If NULL, the union, across all projections, of the raw summary statistics used for projections is potentially used for goodness of fit; however, if this set is too large for gaussian mixture modelling, a subset of variable may be selected. How they are selected is not yet fully settled (see Details). |
plot. |
Control diagnostic plots. |
nb_cores , Simulate , packages , env , verbose |
See same-named |
control.Simulate |
A list of arguments of the |
cl_seed |
NULL or integer (see |
get_gof_stats |
function for selecting raw statistics (see Details). |
Testing goodness-of-fit: The test is somewhat heuristic but appears to give reasonable results (the Example shows how this can be verified). It assumes that all summary statistics are reduced to projections predicting all model parameters. It is then conceived as if any projection p predicting a parameter were a sufficient statistic for this parameter, given the information contained in the summary statistics s (this is certainly the ideal objective of machine-learning regression methods). Then a statistic u independent (under the fitted model) from all projections should be a suitable statistic for testing goodness of fit: if the model is correctly specified, the quantile of observed u, in the distribution of u under the fitted model, should be uniformly distributed over repeated sampling under the data-generating process. The procedure constructs statistics uncorrelated to all p (over repeated sampling under the fitted model) and proceeds as if they were independent from p (rather than simply uncorrelated). A number (depending on the size of the reference table) of statistics u uncorrelated to p are then defined. Each such statistic is obtained as the residual of the regression of a given raw summary statistic to all projections, where the regression input is a simulation table of nsim
replicates of s under the fitted model, and of their projections p (using the “projectors” constructed from the full reference table). The latter regression involves one more, small-nsim
, approximation (as it is the sample correlation that is zeroed) but using the residuals is crucially better than using the original summary statistics (as some ABC software may do). An additional feature of the procedure is to construct a single test statistic t from joint residuals u, by estimating their joint distribution (using Gaussian mixture modelling) and letting t be the density of u in this distribution.
Selection of raw summary statistics: See the code of the Infusion:::..get_gof_stats
function for the method used. It requires that ranger
has been used to produce the projectors, and that the latter include variable importance statistics (by default, Infusion calls ranger
with argument importance="permutation"
). .get_gof_stats
then selects the raw summary statistics with least importance over projections (this may not be optimal, and in particular appears redundant with the procedure described below to construct goodness-of-fit statistics from raw summary statistics; so this might change in a later version), and returns a vector of names of raw statistics, sorted by increasing least-importance. The number of summary statistics can be controlled by the global package option gof_nstats_fn
, a function with arguments nr
and nstats
for, respectively, the number of simulations of the processus (as controlled by goftest(.,nsim)
) and the total number of raw summary statistics used in the projections.
The diagnostic plot will show a data frame of residuals u of the summary statistics identified as the first elements of the vector returned by Infusion:::..get_gof_stats
, i.e. again a set of raw statistics with least-importance over projectors.
An object of class goftest
, which is alist
with element(s)
pval |
The p-value of the test (NULL if the test is not feasible). |
plotframe |
The data frame which is (by default) plotted by the function. Its last line contains the residuals u for the analyzed data, and other lines contain the bootstrap replicates. |
### See end of example("example_reftable") for minimal example.
## Not run:
### Performance of GoF test over replicate draws from data-generating process
# First, run
example("example_reftable")
# (at least up to the final 'slik_j' object), then
# as a shortcut, the same projections will be used in all replicates:
dprojectors <- slik_j$projectors
set.seed(123)
gof_draws <- replicate(200, {
cat(" ")
dSobs <- blurred(mu=4,s2=1,sample.size=40)
## ----Inference workflow-----------------------------------------------
dprojSobs <- project(dSobs,projectors=dprojectors)
dslik <- infer_SLik_joint(dprojSimuls,stat.obs=dprojSobs,verbose=FALSE)
dslik <- MSL(dslik, verbose=FALSE, eval_RMSEs=FALSE)
## ----GoF test-----------------------------------------------
gof <- goftest(dslik,nb_cores = 1L, plot.=FALSE,verbose=FALSE)
cat(unlist(gof))
gof
})
# ~ uniform distribution under correctly-specified model:
plot(ecdf(unlist(gof_draws)))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.