goftest: Assessing goodness of fit of inference using simulation In Infusion: Inference Using Simulation

Description

A goodness-of-fit test is performed in the case projected statistics have been used for inference. Otherwise some plots of limited interest are produced.

Usage

 ```1 2 3 4``` ```goftest(object, nsim = 99L, method = "", stats=NULL, plot. = TRUE, nb_cores = NULL, Simulate = attr(object\$logLs, "Simulate"), packages = attr(object\$logLs, "packages"), env = attr(object\$logLs, "env"), verbose = interactive()) ```

Arguments

 `object` an `SLik` or `SLikp` object. `nsim` Number of draws of summary statistics. `method` For development purposes, not documented. `stats` Character vector, or NULL: the set of summary statistics to be used to construct the test. If NULL, the statistics used accross all projections are used. `plot.` Control diagnostic plots. `plot.` can be of logical, character or numeric type. If `plot.` is `FALSE`, no plot is produced. If `plot.` is `TRUE` (the default), a data frame of up to 8 goodness-of-fit statistics (the statistics denoted u in Details) is plotted. If more than eight raw summary statistics (denoted s in Details) were used, then only the first eight u are retained (the order or u being deduced from the order of use of the s in the different projections). If `plot.` is a numeric vector, then u`[plot.]` are retained (possibly more than 8 statistics, as in the next case). If `plot.` is a character vector, then it is used to match the names of the u statistics (not of s) to be retained in the plot; the names of u are built from names of s by wrapping the latter within `"Res(".")"` (see axes labels of default plots for examples of valid names). `nb_cores, Simulate, packages, env, verbose` See same-named `add_simulation` arguments.

Details

The test is somewhat heuristic but appears to give reasonable results (the Example shows how this can be verified). It assumes that all summary statistics are reduced to projections predicting all model parameters. It is then conceived as if any projection p predicting a parameter were a sufficient statistic for this parameter, given the information contained in the summary statistics s (this is certainly the ideal objective of machine-learning regression methods). Then a statistic u independent (under the fitted model) from all projections should be a suitable statistic for testing goodness of fit: if the model is correctly specified, the quantile of observed u, in the distribution of u under the fitted model, should be uniformly distributed over repeated sampling under the data-generating process. The procedure constructs statistics uncorrelated to all p (over repeated sampling under the fitted model) and proceeds as if they were independent from p (rather than simply uncorrelated). Statistics u uncorrelated to p are obtained as the residuals of the regression of each summary statistic to all projections, where the regression input is a simulation table of `nsim` replicates of s under the fitted model, and of their projections p (using the “projectors” constructed from the full reference table). The latter regression involves one more, small-`nsim`, approximation (as it is the sample correlation that is zeroed) but using the residuals is crucially better than using the original summary statistics (as some ABC software may do). An additional feature of the procedure is to construct a single test statistic t from joint residuals u, by estimating their joint distribution (using Gaussian mixture modelling) and letting t be the density of u in this distribution.

Value

A list with currently a single element

 `pval ` The p-value of the test (NULL if the test is not feasible).

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37``` ```## Not run: ## Long example, despite minimal settings! ## Showing uniform distribution under correctly-specified model # Normal(mu,sd) model, with inefficient summary statistics: blurred <- function(mu,s2,sample.size) { s <- rnorm(n=sample.size,mean=mu,sd=sqrt(s2)) s <- exp(s/4) return(c(mean=mean(s),var=var(s))) } # Construct reference table and projections once for all replicates: set.seed(123) parsp_j <- data.frame(mu=runif(5000L,min=2.8,max=5.2), s2=runif(5000L,min=0.4,max=2.4),sample.size=40) dsimuls <- add_reftable(,Simulate="blurred",par.grid=parsp_j,verbose=FALSE) mufit <- project("mu",stats=c("mean","var"),data=dsimuls,verbose=FALSE,knotnbr=5000L) s2fit <- project("s2",stats=c("mean","var"),data=dsimuls,verbose=FALSE,knotnbr=5000L) dprojectors <- list(MEAN=mufit,VAR=s2fit) dprojSimuls <- project(dsimuls,projectors=dprojectors,verbose=FALSE) # Analysis of replicate draws from data-generating process gof_draws <- replicate(50, { cat(".") dSobs <- blurred(mu=4,s2=1,sample.size=40) ## stands for the actual data to be analyzed ## ----dcflow-------------------------------------------------------------- dprojSobs <- project(dSobs,projectors=dprojectors) dslik <- infer_SLik_joint(dprojSimuls,stat.obs=dprojSobs,verbose=FALSE) dslik <- MSL(dslik, verbose=FALSE, eval_RMSEs=FALSE) #dslik <- refine(dslik,maxit=2L,verbose=FALSE, update_projectors=TRUE) gof <- goftest(dslik,nsim = 99L,nb_cores = 1L, method="", plot.=FALSE,verbose=FALSE) cat(unlist(gof)) gof }) plot(ecdf(unlist(gof_draws))) ## End(Not run) ```

Infusion documentation built on Feb. 22, 2021, 9:08 a.m.